LeetCode 187 Repeated DNA Sequences
来源:互联网 发布:动态加载数据js 编辑:程序博客网 时间:2024/06/05 03:55
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].
以下3种方式:从易到难,但是速度越来越看。具体逻辑见代码以及其注释。
方法一:Runtime: 38 ms beats 68.58% of javasubmissions.
public List<String> findRepeatedDnaSequences2(String s) {Set<String> dna = new HashSet<>(), res = new HashSet<>();for (int i = 10; i <= s.length(); i++) {String chars = s.substring(i - 10, i);if (!dna.add(chars)) res.add(chars);}return new ArrayList<>(res);}方法二:Runtime: 14 ms beats 97.35% of javasubmissions.
private static final byte[] t = new byte[128];static {t['A'] = 0;t['C'] = 1;t['G'] = 2;t['T'] = 3;}public List<String> findRepeatedDnaSequences(String s) { //12ms 99.77%boolean[] has = new boolean[1048576];// 1048576 = 1>>20;boolean[] written = new boolean[1048576];List<String> list = new ArrayList();char[] c = s.toCharArray();int n = c.length, cur = 0;if (n < 10) return list;for (int i = 0; i < 9; i++)cur = (cur << 2) | t[c[i]];//前9位字符对应的数值,每个字符占用二进制的两位for (int i = 9; i < n; i++) {cur = ((cur << 2) | t[c[i]]) & 0xFFFFF;//只保留10位字符对应的值if (has[cur]) {if (!written[cur]) {list.add(s.substring(i - 9, i + 1));written[cur] = true;}} elsehas[cur] = true;}return list;}方法三:Runtime: 7 ms beats 99.93% of javasubmissions.
private static final byte[] t = new byte[128];static {t['A'] = 0;t['C'] = 1;t['G'] = 2;t['T'] = 3;}public List<String> findRepeatedDnaSequences3(String s) {final long[] has = new long[16384];//16384 = 1<<14final long[] written = new long[16384];ArrayList<String> dupSeqs = new ArrayList<>();if (s.length() <= 10) return dupSeqs;char[] c = s.toCharArray(); //String.charAt will be slower than char array access int cur = 0;for (int i = 0; i < 9; i++) {cur = (cur << 2) | t[c[i]];}for (int i = 9; i < c.length; i++) {cur = ((cur << 2) | t[c[i]]) & 0xFFFFF;//只保留10位字符对应的值,1个字符占2位二进制int idx = (cur >> 6);//前14位的二进制cur作为index,后6位作为bitmap的值//long型只有64位长度,64正好是1<<6.如果这里是dnaSeqRep >> 7,会出现1左移超过64位发生溢出,高位无效的情况.long bitmap = 1L << (cur & 0x3f);//if the sequence has a duplicate and haven't been added beforeif ((has[idx] & bitmap) != 0) {if ((written[idx] & bitmap) == 0) {written[idx] |= bitmap;dupSeqs.add(s.substring(i - 9, i + 1));}} else {has[idx] |= bitmap;}}return dupSeqs;}方法一使用了set,因此效率不高。
方法二比方法三简洁,但是比方法三慢,原因在我看来,是has数组和written数组定义长度为100多万,太长,造成数组寻址时间花费过多,可是定义长度为1>>20又是必需的,因为10-letter-long sequences ,不同的二进制表达有1>>20种(4的10次)。
参考https://discuss.leetcode.com/topic/31963/8ms-of-java-solution/4
0 0
- leetcode 187: Repeated DNA Sequences
- LeetCode #187Repeated DNA Sequences
- Leetcode 187 Repeated DNA Sequences
- LeetCode(187)Repeated DNA Sequences
- leetcode 187: Repeated DNA Sequences
- [leetcode] 187 Repeated DNA Sequences
- LeetCode 187 Repeated DNA Sequences
- LeetCode 187 Repeated DNA Sequences
- leetcode(187):Repeated DNA Sequences
- LeetCode #187: Repeated DNA Sequences
- LeetCode[187] Repeated DNA Sequences
- [leetcode][187]Repeated DNA Sequences
- Leetcode 187 Repeated DNA Sequences
- Leetcode 187 Repeated DNA Sequences
- LeetCode.187 Repeated DNA Sequences
- Leetcode Repeated DNA Sequences
- Repeated DNA Sequences [leetcode]
- [LeetCode] Repeated DNA Sequences
- 腾讯2016研发工程师编程题之生成格雷码
- 双链表的建立,删除及插入
- nmap把端口定义为六种状态的解读
- 访问项目配置文件web.xml
- 文本指纹算法和内容指纹系统介绍
- LeetCode 187 Repeated DNA Sequences
- C#程序设计基础——运算符与表达式
- day09_python socket编程之服务端代码
- Bzoj 1566: [NOI2009]管道取珠(动态规划->神题)
- Linux-测网速
- MSP430 使用一些理解
- Android开发之集成测试
- Unity3D下载地址
- 包含min函数的栈