Leetcode|Repeated DNA Sequences

来源:互联网 发布:linux 查看关机原因 编辑:程序博客网 时间:2024/05/05 12:38

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].
解法1:暴力法(Memory Limit Exceeded)从头到尾依次查询,借助map统计出现次数。大于1的就算

vector<string> findRepeatedDnaSequences(string s) {        map<string,int> key;        vector<string> res;        if(s.size()<=10) return res;        for(int i=0;i<=s.size()-10;i++)        {            string tmp=s.substr(i,10);            key[tmp]++;            if(key[tmp]==2) res.push_back(tmp);        }        return res;    }
解法2:存储太多字符串会导致memory过大,因为字符只有四种,这个好办了,把字符串表示成4进制的数字就OK了。(72ms)

int ACGT2INT(char c){    switch(c)    {        case 'A': return 0;        case 'C': return 1;        case 'G': return 2;        case 'T': return 3;    }    return -1;}int DNA2INT(string& m){    const int MAX=10;    int res=0;    for(int i=0;i<MAX;i++)    {        res=res*4+ACGT2INT(m[i]);    }    return res;}vector<string> findRepeatedDnaSequences(string s) {    const int N=1048576;    int key[N];    memset(key,0,sizeof(key));    vector<string> res;    if(s.size()<=10) return res;    for(int i=0;i<=s.size()-10;i++)    {        string tmp=s.substr(i,10);        key[DNA2INT(tmp)]++;        if(key[DNA2INT(tmp)]==2) res.push_back(tmp);    }    return res;    }  
注意这里我用了一个数组来记录出现次数,
const int N=1048576;//因为四进制的10位数,最大值不会超过1024^2    int key[N];    memset(key,0,sizeof(key));
但是如果把这些换成unordered_map<int,int> key; 运行时间为150ms左右。(leetcode 30个例子测试时间)

如果换成map<int,int> key;测试时间为280ms。

所以可以看出数组和map还有unordered_map的效率问题。

能不用后两者的就用数组记录hash情况。



0 0
原创粉丝点击