Repeated DNA Sequences (Java)

来源:互联网 发布:微信矩阵群 编辑:程序博客网 时间:2024/05/17 06:13

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

不会写,参考网上的做法,基本都是将字符串转换成数字保存,然后放入哈希表中进行判断。

Source1 (MTL了)

    public List<String> findRepeatedDnaSequences(String s) {        List<String> res = new ArrayList<String>();        if(s.length() <= 10) return res;                int[] a = new int['T' + 1]; //数组开到ASCII中'T'+1的位置        char[] b = {'A', 'C', 'G', 'T'};        a['A'] = 0;         a['C'] = 1;        a['G'] = 2;        a['T'] = 3;                HashMap<Long, Integer> hm = new HashMap<Long, Integer>();  //Long不是long                for(int i = 0; i < s.length() - 9; i++){        long sum = 0;        for(int j = i + 9; j >= i; j--){        sum += a[s.charAt(j)] * Math.pow(10, i + 9 - j);        }        if(!hm.containsKey(sum)){        hm.put(sum, 1);        }        else{                if(hm.get(sum) == 1){        String temp = new String();        for(int j = 9; j >= 0; j--){        int k = (int)(sum % 10);        char c = b[k];        sum /= 10;        temp += c;        }        res.add(temp);        }        else hm.put(sum, hm.get(sum) + 1);        }                }        return res;        }    


    

Test

    public static void main(String[] args){    String s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT";         System.out.println(new Solution().findRepeatedDnaSequences(s));    }


Source2

    public List<String> findRepeatedDnaSequences(String s) {    HashSet<Integer> a = new HashSet<>();    HashSet<Integer> b = new HashSet<>();    List<String> res = new ArrayList<>();    char[] map = new char[26];    map['C' - 'A'] = 1;    map['G' - 'A'] = 2;    map['T' - 'A'] = 3;        for(int i = 0; i < s.length() - 9; i++){    int sum = 0;    for(int j = i; j < i + 10; j++){    sum <<= 2; //因为map中有2,3都是两位,所以一次sum运算要移两位    sum |= map[s.charAt(j) - 'A'];    }    if(!a.add(sum) && b.add(sum)){ //***非常巧妙,!a.add(sum)保证多于一次的返回true,即出现两次及以上时返回true,b.add(sum)保证只有第二次加入res,不重复加入    //hashset是不允许重复的,如果重复的话,add方法会返回false    res.add(s.substring(i, i + 10));    }    }    return res;    }



0 0
原创粉丝点击