[LeetCode] Repeated DNA Sequences

来源:互联网 发布:疯狂的美工 编辑:程序博客网 时间:2024/06/06 18:41

Repeated DNA Sequences

 

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

解题思路:

1、用map存储已经扫描过的子串,并对之计数。时间复杂度为O(n)。代码如下:

class Solution {public:    vector<string> findRepeatedDnaSequences(string s) {        map<string, int> count;                vector<string> result;                int len = s.length();        for(int i=0; i<len-10; i++){            string str = s.substr(i, 10);            map<string, int>::iterator it=count.find(str);            if(it!=count.end()){                count[str]=1;            }else{                if(it->second==1){                    result.push_back(str);                }                count[str]++;            }        }                return result;    }};
但是会报内存溢出错误。

2、为AGCT分别编码,共有4种,故只需两位便能编码。共十个字符,只需20位就能表示任何一种组合。int类型32位,因此可以用一个int类型来存储10个字符串。每检查一个字符之后,需要将最高位置为0。代码如下:

class Solution {public:    vector<string> findRepeatedDnaSequences(string s) {        const int subStrLen = 10;        const int mask = 0x3ffff;                map<int, int> count;                map<char, int> cCode;        cCode['A']=0;        cCode['C']=1;        cCode['G']=2;        cCode['T']=3;                vector<string> result;                int len = s.length();                int code=0;                if(len>subStrLen){            string str = s.substr(0, subStrLen);            for(int i=0; i<subStrLen; i++){                code <<= 2;                code |= cCode[str[i]];            }            count[code] = 1;        }                for(int i=subStrLen; i<len; i++){            code &= mask;    //清空最高位            code <<= 2;            code |= cCode[s[i]];            count[code]++;            if(count[code]==2){                result.push_back(s.substr(i-subStrLen + 1, subStrLen));            }        }                return result;    }};
在leetCode中,我跑上面的代码需要269ms,看了很多其他人的时间,都比我快很多。不知各位有更好的方法没有。


0 0
原创粉丝点击