187. Repeated DNA Sequences

来源:互联网 发布:暴走大事件 知乎 编辑:程序博客网 时间:2024/06/10 15:44

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

public class Solution {    public int hashcode(String s){    int res = 0;    for(int i=0; i<s.length(); i++){    res = res << 2 | hash(s.charAt(i));    }    return res;    }        public List<String> findRepeatedDnaSequences(String s) {    List<String> list = new LinkedList<String>();        if(s == null || s.length() <= 10){        return list;        }        HashSet<Integer> set = new HashSet<Integer>();        for(int i=0; i<=s.length() - 10; i++){        String tmp = s.substring(i, i + 10);        int hash = hashcode(tmp);        if(set.contains(hash) && !list.contains(tmp)){ //记得判断该字符串不在当前的list里面        list.add(tmp);        }else{        set.add(hash);        }        }        return list;    }    public static int hash(char c){    if(c == 'A'){    return 0;    }else if(c == 'C'){    return 1;    }else if(c == 'G'){    return 2;    }else if(c == 'T'){    return 3;    }return 0;    }}
将字符转换为整数型以便节约空间。

0 0
原创粉丝点击