leetcode:Repeated DNA Sequences

来源:互联网 发布:淘宝达人发布完哪里查 编辑:程序博客网 时间:2024/06/18 17:35

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].
题目解释:找重复出现的子串,并且子串长度为10

思路:字符串只有4种char,只需2bit便能识别4种字符,分别map为00,01,11,10。20bit便能保存10个字符组成的字符串,int型变量足够保存了。找到相同的整型数再转换成对应的字符串保存即可。

代码:

class Solution {public:vector<string> findRepeatedDnaSequences(string s) {map<char, int> D;D.insert(pair<char, int>('A', 0));D.insert(pair<char, int>('C', 1));D.insert(pair<char, int>('G', 2));D.insert(pair<char, int>('T', 3));map<int,char> Ds;Ds.insert(pair<int, char>(0, 'A'));Ds.insert(pair<int, char>(1, 'C'));Ds.insert(pair<int, char>(2, 'G'));Ds.insert(pair<int, char>(3, 'T'));unordered_set<int> UN;unordered_set<int> indic;vector<string> find;int hash = 0;for (int i = 0; i < s.length(); i++) {if (i < 9) {hash = (hash << 2) + D[s[i]];}else {hash = (hash << 2) + D[s[i]];hash = hash&((1 << 20) - 1);if (UN.find(hash) == UN.end()) {UN.insert(hash);}else {if (indic.find(hash) == indic.end()) {string s1 = "";int h = hash;int k = 0;while (k<10) {s1 = Ds[(h & 1) + (h & 2)] + s1;h >>= 2;k++;}find.push_back(s1);indic.insert(hash);}}}}return find;}};



0 0