Repeated DNA Sequences —— Leetcode(教训,重做)

来源:互联网 发布:docker安装windows 10 编辑:程序博客网 时间:2024/06/05 01:57

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

 hashmap很容易想到,但有存在一个问题:如何把十位的字符串表示成一个key;

注意到ACGT ASC码中ACGT后三位分别不同,可以用后三位来代表每一个字母,那么十位字符串总共30位,不超过一个int;

当然也可以用00, 01, 10, 11来表示这四个字符,不过更繁琐。

其次在编码过程中注意:

1. +的优先级远高于& |

2. 前++与后++的区别

3. STL中unordered_map的用法

4. 控制vector中的唯一性用==

具体源码如下:

class Solution {public:    vector<string> findRepeatedDnaSequences(string s) {        std::unordered_map<int, int> m;        vector<string> result;                int i=0, tmp = 0;        while(i<10) {            tmp = (tmp<<3) + (s[i++]&7);    //注意此处i++,是先算,后加;注意此处要加括号!!        }        m[tmp]++;        while(i<s.size()) {            tmp = (tmp<<3&0x3FFFFFFF) + (s[i++]&7);   //注意此处括号,+的优先级大于&,所以要括号;也可换成|去掉括号            if(++m[tmp]/*++*/ == 2)  //深刻注意此处前++与后++的区别,2用于排除>2时不重复的情况                result.push_back(s.substr(i-10, 10));        }                return result;    }};


0 0
原创粉丝点击