187. Repeated DNA Sequences

来源:互联网 发布:快手用户数据 编辑:程序博客网 时间:2024/05/23 12:36

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

Subscribe to see which companies asked this question

每个基因可以用两位表示

A:00 ->1

B:01 ->1

G::10 ->2

T:11 ->3

10个字符可以2^20种表达形式2^20<2^32,所以可以用int来存放。

class Solution {public:    vector<string> findRepeatedDnaSequences(string s) {        vector<string> res;        int len=s.size();        if(len<10) return res;        map<int,int> m;        for(int i=0;i<=len-10;i++){            string sub=s.substr(i,10);            int code=encode(sub);            if(m.count(code)){                if(m[code]==1) res.push_back(sub);                m[code]++;            }else{                m[code]++;            }        }        return res;    }private:    int encode(string sub){        int code=0;        for(int i=0;i<sub.size();i++){            code<<=2;            switch(sub[i]){                case 'A':code+=1;break;                case 'C':code+=2;break;                case 'G':code+=3;break;                case 'T':code+=4;break;            }        }        return code;    }};


0 0
原创粉丝点击