Repeated DNA Sequences @leetcode
来源:互联网 发布:男生紧身裤 知乎 编辑:程序博客网 时间:2024/04/29 04:33
早上到公司第一件事变成了刷leetcode,发现各种题目的各种解法,真心是个挺有趣的过程。比如今天早上碰到的这个DNA序列的问题,一开始完全没有头绪,但是后来看了些文章发现,真的是二进制大法好啊!
会了二进制,走遍天下都不怕啊。
原题如下:
All DNA is composed of a series of nucleotides abbreviated as A, C, G,
and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes
useful to identify repeated sequences within the DNA.Write a function to find all the 10-letter-long sequences (substrings)
that occur more than once in a DNA molecule.For example,
Given s = “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”,
Return: [“AAAAACCCCC”, “CCCCCAAAAA”].
这一题经典的用二进制序列表示字符串序列,以减少内存消耗的例子。
题目中提到DNA序列只包含四种碱基对,分别用A,C,G和T表示,那么就可以用二进制数来分别代表它们:
A:00
C:01
G:10
T:11
那么形如ACGT的DNA序列就可以表示为00011011,也就是27。而且这个值对于所有DNA序列都是唯一的,那么就可以把它作为key,出现的次数作为value,将已出现过的key都放入哈希表中即可。
public class Solution { public List<String> findRepeatedDnaSequences(String s) { List<String> result = new LinkedList<String>(); HashMap<Character, Integer> tokenValueMap = new HashMap<Character, Integer>(); tokenValueMap.put('A', 0); tokenValueMap.put('C', 1); tokenValueMap.put('G', 2); tokenValueMap.put('T', 3); HashMap<Integer, Integer> sequenceCountMap = new HashMap<Integer, Integer>(); int length = s.length(); for (int index = 0; index <= length - 10; index++) { int value = 0; for (int i = 0; i < 10; i++) { value <<= 2; value += tokenValueMap.get(s.charAt(index + i)); } if (!sequenceCountMap.containsKey(value)) { sequenceCountMap.put(value, 1); } else if (sequenceCountMap.get(value) == 1) { sequenceCountMap.put(value, 2); result.add(s.substring(index, index + 10)); } } return result; }}
上面的java代码可以完美ac,但是大家再看下面这段:
public static List<String> findRepeatedDnaSequences2(String s) { List<String> result = new ArrayList<String>(); Map<Character, Integer> tokenValueMap = new HashMap<Character, Integer>(); tokenValueMap.put('A', 0); tokenValueMap.put('C', 1); tokenValueMap.put('G', 2); tokenValueMap.put('T', 3); int length = s.length(); Map<Integer, Integer> seqMap = new HashMap<Integer, Integer>(); for (int i=0; i<=length-10; i++) { int value = 0; for (int j=0; j<10; j++) { value <<= 2; Character c = s.charAt(i+j); Integer tokenValue = tokenValueMap.get(c); value += tokenValue; } if (!seqMap.containsKey(value)) { seqMap.put(value, 1); } else if (seqMap.get(value) == 1) { result.add(s.substring(i,i+10)); seqMap.put(value, seqMap.get(value)+1); } } return result; }
这一段代码就有可能报Memory Limit Exceeded
但是如果你多提交几次,你会发现居然有可能AC了。
这完全取决于虚拟机,是否在提交过程中是否有对垃圾进行回收,因为在
for (int j=0; j<10; j++) { value <<= 2; Character c = s.charAt(i+j); Integer tokenValue = tokenValueMap.get(c); value += tokenValue; }
这个for循环中产生了非常多的character对象。
- Leetcode Repeated DNA Sequences
- Repeated DNA Sequences [leetcode]
- [LeetCode] Repeated DNA Sequences
- Leetcode Repeated DNA Sequences
- Leetcode:Repeated DNA Sequences
- Leetcode: Repeated DNA Sequences
- LeetCode: Repeated DNA Sequences
- LeetCode: Repeated DNA Sequences
- LeetCode Repeated DNA Sequences
- LeetCode--Repeated DNA Sequences
- [LeetCode]Repeated DNA Sequences
- [Leetcode]Repeated DNA Sequences
- [leetcode]Repeated DNA Sequences
- Repeated DNA Sequences - LeetCode
- Leetcode: Repeated DNA Sequences
- Leetcode:Repeated DNA Sequences
- leetcode:Repeated DNA Sequences
- LeetCode - Repeated DNA Sequences
- Web前端工程师成长之路——知识汇总
- 5.HCNP-R&S-IERN——计算OSPF区域内路由
- C++:关于容器的种类
- ip查询地址接口架构思路记录
- 需求:IOS上传AppleStore简化制作各种大小AppIcon问题
- Repeated DNA Sequences @leetcode
- c# 导出oracle数据库下所有表结构
- 2015编程之美资格赛 B 回文字符序列
- ㄞ熵♒∷
- Elasticsearch的[monitor.jvm]垃圾回收日志
- 浅谈redis数据库的键值设计
- 浅谈redis数据库的键值设计
- on_TMdd
- poj-1840-Eqs 哈希(hash)