Repeated DNA Sequences

来源：互联网发布：网站搜索引擎优化编辑：程序博客网时间：2024/05/16 05:30

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

思路参考这个https://leetcode.com/discuss/24478/i-did-it-in-10-lines-of-c

A is 0x41, C is 0x43, G is 0x47, T is 0x54. Still don't see it? Let me write it in octal.

A is 0101, C is 0103, G is 0107, T is 0124. The last digit in octal are different for all four letters. That's all we need!

也就是说写成8进制的时候，字母的最后一位都是不同的。而八进制的一位需要二进制的3位。所以每个字母仅仅凭借它的2进制表示最后3位就可以区分出来。

我们使用一个整数来代替存字符串在map里面，因为整数共有32位，我们需要看10个字母，所以每次读取一个字母，就把前面的左移3位（所以共占30位，前面多出来的两位用&3FFFFFFF去掉。），然后把当前的字母代表数字concatenate到数字上。

    public List<String> findRepeatedDnaSequences(String s) {        Map<Integer, Integer> map = new HashMap<Integer, Integer>();        List<String> result = new ArrayList<String>();        int num = 0;        for (int i = 0; i < s.length(); i++) {            num = ((num << 3 & 0x3FFFFFFF) | (s.charAt(i) & 7));            if (map.get(num) != null && map.get(num).equals(1)) {                result.add(s.substring(i-9, i+1));            }            map.put(num, map.get(num) == null ? 1 : map.get(num)+1);        }        return result;    }

0 0