Repeated DNA Sequences

来源:互联网 发布:青岛软件开发公司 编辑:程序博客网 时间:2024/05/19 07:10

原文地址:http://blog.csdn.net/u013325815/article/details/43601367

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

Show Tags
Have you met this question in a real interview?

由于直接将字符串存入字典会导致Memory Limit Exceeded,采用位操作将字符串转化为整数可以减少内存开销

字典+位运算

//A=0x41, C=0x43, G=0x47, T=0x54

//A=0101, C=0103, G=0107, T=0124

//A,C,G,T最后3bit不同,即与mask“0111”作与运算可唯一标识ACGT。

[java] view plaincopy在CODE上查看代码片派生到我的代码片
  1. public class Solution {  
  2.     public List<String> findRepeatedDnaSequences(String s) {  
  3.         List<String> list = new ArrayList<String>();  
  4.         if(s == null || s.length()<=10return list;  
  5.           
  6.         int mask = 0x7FFFFFF;  
  7.         int i=0;  
  8.         int cur = 0;  
  9.         HashMap<Integer,Integer> hashmap = new HashMap<Integer,Integer>();  
  10.         while(i<9){  
  11.             cur = ((cur<<3) | s.charAt(i) & 7);  
  12.             i++;  
  13.         }  
  14.           
  15.         while(i<s.length()){  
  16.             cur = ((cur & mask)<<3 | s.charAt(i) & 7);  
  17.             i++;  
  18.             if(hashmap.containsKey(cur)){  
  19.                 int count = hashmap.get(cur);  
  20.                 if(count == 1){  
  21.                     list.add(s.substring(i-10,i));  
  22.                 }  
  23.                 hashmap.put(cur, count+1);  
  24.             }else{  
  25.                 hashmap.put(cur,1);  
  26.             }  
  27.         }  
  28.         return list;  
  29.     }  
  30. }  

0 0