Repeated DNA Sequences
来源:互联网 发布:淘宝卖家怎么联系快递 编辑:程序博客网 时间:2024/05/21 09:34
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].
Subscribe to see which companies asked this question.
解题技巧:
该题主要采用了两种技巧:位运算、hash。
考虑将ACGT进行二进制编码,即:A -> 00, C -> 01, G -> 10, T -> 11;在编码的情况下,每10位字符串的组合即为一个数字,且10位的字符串有20位,一般来说int有4个字节,32位,即可以用于对应一个10位的字符串。例如:ACGTACGTAC -> 00011011000110110001
20位的二进制数,至多有2^20种组合,因此hash table的大小为2^20,即1024 * 1024,将hash table设计为bool hashTable[1024 * 1024];
在处理字符串时,每次向右移动1位字符,相当于字符串对应的int值左移2位,再将其最低2位置为新的字符的编码值,最后将高2位置0;得到当前的子字符串对应的值val后,判断该值是否出现过,如果未出现,则将hasTable[val]设置为true,否则,将当前的子字符串存入到set容器中
代码:
#include <iostream>#include <string>#include <vector>#include <set>#include <mem.h>#include <map>using namespace std;vector<string> findRepeatedDnaSequences(string s){ vector<string> res; if(s.length() < 10) return res; map<char,int> mp; mp['A'] = 0; mp['C'] = 1; mp['G'] = 2; mp['T'] = 3; bool exist[1024*1024]; memset(exist, false, sizeof(exist)); int val = 0; for(int i = 0; i < 10; i ++) { val <<= 2; val |= mp[s[i]]; } exist[val] = true; set<string> tmp; for(int i = 10; i < s.length(); i ++) { val <<= 2; val |= mp[s[i]]; val &= ~(0x300000); if(exist[val]) tmp.insert(s.substr(i-9,10)); else exist[val] = true; } set<string>::iterator it = tmp.begin(); while(it != tmp.end()) { res.push_back(*it); it++; } return res;}int main(){ vector<string> res; string s; cin >> s; res = findRepeatedDnaSequences(s); for(int i = 0; i < res.size(); i ++) { cout<<res[i]<<' '; }}
- Leetcode Repeated DNA Sequences
- Repeated DNA Sequences [leetcode]
- Repeated DNA Sequences
- Repeated DNA Sequences
- [LeetCode] Repeated DNA Sequences
- Repeated DNA Sequences
- Leetcode Repeated DNA Sequences
- Leetcode:Repeated DNA Sequences
- Leetcode: Repeated DNA Sequences
- Repeated DNA Sequences (Java)
- Repeated DNA Sequences
- LeetCode: Repeated DNA Sequences
- LeetCode: Repeated DNA Sequences
- LeetCode Repeated DNA Sequences
- LeetCode--Repeated DNA Sequences
- [LeetCode]Repeated DNA Sequences
- Repeated DNA Sequences
- [Leetcode]Repeated DNA Sequences
- Hold住Leetcode——Two Sum II
- 486. Predict the Winner
- 行为型设计模式-命令模式
- 计蒜客 15499 阿里的新游戏 题解
- 我将进化成一条狗(7)——脑机接口
- Repeated DNA Sequences
- Java集合——HashMap原理及要点(二)
- hadoop-20
- windows下KafkaOffsetMonitor下载及安装
- WORD系列教程-多级编号制作合同
- web--7.分页显示
- 机器学习实战笔记--决策树
- hadoop-21
- python2,3并存问题