Repeated DNA Sequences
来源:互联网 发布:c语言getchar和scanf 编辑:程序博客网 时间:2024/06/04 00:41
hash table plus bit manipulation method
(view the Show Tags and Runtime 10ms !)
算法分析
首先考虑将ACGT进行二进制编码
A -> 00
C -> 01
G -> 10
T -> 11
在编码的情况下,每10位字符串的组合即为一个数字,且10位的字符串有20位;一般来说int有4个字节,32位,即可以用于对应一个10位的字符串。例如
ACGTACGTAC -> 00011011000110110001
AAAAAAAAAA -> 00000000000000000000
20位的二进制数,至多有2^20种组合,因此hash table的大小为2^20,即1024 * 1024,将hash table设计为bool hashTable[1024 * 1024];
遍历字符串的设计
每次向右移动1位字符,相当于字符串对应的int值左移2位,再将其最低2位置为新的字符的编码值,最后将高2位置0。例如
src CAAAAAAAAAC
subStr CAAAAAAAAA
int 0100000000
subStr AAAAAAAAAC
int 0000000001
时间复杂度
字符串遍历O(n),hash tableO(1);总时间复杂度O(n)
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> result = new ArrayList<String>();
if(s == null || s.length() < 10) return result;
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
Integer val = 0;
for(int i = 0; i < 10; i ++){
val = val << 2;
val |= toInt(s.charAt(i));
}
map.put(val, 1);
for(int i = 10; i < s.length(); i ++){
val = ((val & 0x3ffff) << 2) | toInt(s.charAt(i));
if(map.containsKey(val)) map.put(val, map.get(val) + 1);
else map.put(val, 1);
}
for(Integer v : map.keySet())
if(map.get(v) > 1) result.add(toDNA(v));
return result;
}
private Integer toInt(char c){
if(c == 'A') return 0;
else if(c == 'C') return 1;
else if(c== 'G') return 2;
else return 3;//T
}
private String toDNA(Integer i){
StringBuilder sb = new StringBuilder();
for(int j = 0; j < 10; j ++){
int tmp = i % 4;
i = i / 4;
char c = 'T';
if(tmp == 0) c = 'A';
else if(tmp == 1) c = 'C';
else if(tmp == 2) c ='G';
sb.insert(0, c);
}
return sb.toString();
}
}
- Leetcode Repeated DNA Sequences
- Repeated DNA Sequences [leetcode]
- Repeated DNA Sequences
- Repeated DNA Sequences
- [LeetCode] Repeated DNA Sequences
- Repeated DNA Sequences
- Leetcode Repeated DNA Sequences
- Leetcode:Repeated DNA Sequences
- Leetcode: Repeated DNA Sequences
- Repeated DNA Sequences (Java)
- Repeated DNA Sequences
- LeetCode: Repeated DNA Sequences
- LeetCode: Repeated DNA Sequences
- LeetCode Repeated DNA Sequences
- LeetCode--Repeated DNA Sequences
- [LeetCode]Repeated DNA Sequences
- Repeated DNA Sequences
- [Leetcode]Repeated DNA Sequences
- 解决 repo init时, “gpg: 无法检查签名:找不到公钥”
- visual studio 2013数据源类型列表中没有SQLite的解决方法
- losetup 是一个设置和控制loop device设备的linux命令
- java正则表达式 过滤特殊字符的正则表达式
- bcmul 浮点数取整问题
- Repeated DNA Sequences
- 使用Session防止表单重复提交
- T_S_TAB_PARTITION_ADD_MID和T_S_TAB_PARTITION_DEL_MID
- 模块[LCD]_Android LCD(一):LCD基本原理篇
- 网页设置图标即地址栏图标
- 使用Gson解析复杂的json数据
- unity 脚本的enabled属性设置成false的问题
- Cookie rejected: Illegal path attribute "/nexus". Path of origin: "/content/" 解决方案
- Reader、FileOutputStream