187. Repeated DNA Sequences

来源:互联网 发布:php popen 异步 编辑:程序博客网 时间:2024/05/23 13:13

Problem

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,
Given s = “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”,

Return:
[“AAAAACCCCC”, “CCCCCAAAAA”].

Solution

Discuss中的回答

  “A”,“C”,“G”,“T”的ASCII码分别是65,67,71和84,转换成二进制分别是‭01000001‬,‭‭01000011,‭01000111,‭01010100,可以看到他们的最后三位是不一样的。所以只需要用最后三位就可以区别这四个字母。

class Solution {public:    vector<string> findRepeatedDnaSequences(string s) {        unordered_map<int,int> map;        vector<string> ret;        int key = 0;        for(int i = 0;i<s.length();++i)        {            key = ((key<<3)|(s[i] & 0x7)) & 0x3FFFFFFF;            if(i<9)                continue;            if(map.find(key) == map.end())                map[key] = 1;            else if(map[key] == 1)            {                ret.push_back(s.substr(i-9,10));                map[key]++;            }        }        return ret;    }};
0 0
原创粉丝点击