liuxuhelloworld's notebook

题目链接

https://leetcode-cn.com/problems/repeated-dna-sequences/

解答过程

在知晓题目主标签是sliding window的条件下,还是比较容易想到通过滑动窗口来降低计算成本。但是具体怎样将一个固定长度的子串映射为某个特征值呢?一开始我想的是把这个子串当作一个26进制的数来计算,但是很快发现溢出问题。后面参考了官方题解才想到,为了保证计算速度,往2进制靠拢才是正确思路。

	private Map<Character, Integer> map = new HashMap<Character, Integer>() ;

	public List<String> findRepeatedDnaSequences(String s) {
		if (s == null) {
			return Collections.emptyList();
		}

		int len = s.length();
		if (len <= 10) {
			return Collections.emptyList();
		}

		List<String> ret = new ArrayList<>();

		Map<Integer, Integer> cnt = new HashMap<>();

		int hash = 0;
		for (int i = 0; i < 10; i++) {
			hash = hash << 2 | map.get(s.charAt(i));
		}
		cnt.put(hash, 1);

		for (int i = 1; i <= len - 10; i++) {
			char in = s.charAt(i + 9);
			hash = hash << 2 | map.get(in);
			hash = hash & 0b00000000000011111111111111111111;

			cnt.put(hash, cnt.getOrDefault(hash, 0) + 1);

			if (cnt.get(hash) == 2) {
				ret.add(s.substring(i, i + 10));
			}
		}

		return ret;
	}