[LeetCode] Repeated DNA Sequences

来源：互联网发布：淘宝客微博推广教程编辑：程序博客网时间：2024/05/23 00:02

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

方法一：

public class Solution { public List<String> findRepeatedDnaSequences(String s) { HashMap<String,Integer> map=new HashMap<String,Integer>(); List<String> re=new ArrayList<String>(); for(int i=0;i<s.length()-9;i++){ String str=s.substring(i,i+10); if(map.containsKey(str)&&map.get(str)==1){ re.add(str); } Integer a=map.get(str); if(a==null) a=0; map.put(s,a+1); } return  re; }}

方法二：

上一种方法用的是Hash，如果想要在优化，就是对每个长度为10的substring进行编码，这样连hash都不用，直接通过Index定位即可(因为Hash内部获取String的HashCode,然后通过得到的Hash编码定位，进而与这个key对应得value进行比较)，这样最少是需要2*10的比较次数的(当然，不存在和有Hash冲突除外)。而如果可能的种类是有限的（且小于Integer.MAX_VALUE），如本题目，10个字母(4种类型)的排列方式是有限的，由于可以直接用数组，这样不用做2*10，只需要编码和定位就可以了（10）。

public class Solution2 {//上一种方法用的是Hash，如果想要在优化，就是对每个长度为10的substring进行编码，//这样连hash都不用，直接通过Index定位即可(因为Hash内部获取String的HashCode,然后通过得到的Hash编码定位，进而与这个key对应得value进行比较)//这样最少是需要2*10的比较次数的(当然，不存在和有Hash冲突除外)//而如果可能的种类是有限的（且小于Integer.MAX_VALUE），如本题目，10个字母(4种类型)的排列方式是有限的，由于可以直接用数组，这样不用做2*10，只需要编码和定位就可以了（10） public List<String> findRepeatedDnaSequences(String s) {List<String> re=new ArrayList<String>();int[] map=new int[1024*1024];for(int i=0;i<s.length()-9;i++){String str=s.substring(i,i+10);Integer key=encode(str);if(map[key]++==1){re.add(str);}}return re; } public int encode(String s){int re=0;for(int i=0;i<10;i++){re<<=2;//4种类型 用两位表示即可switch(s.charAt(i)){case 'A': re+=0;break;case 'C': re+=1;break;case 'T': re+=2;break;case 'G': re+=3;break;}}return re;}}

阅读全文

1 0