leetcode之Repeated DNA Sequences

来源：互联网发布：手机阿里云系统编辑：程序博客网时间：2024/05/16 00:44

转自http://blog.csdn.net/xudli/article/details/43666725

Total Accepted: 1161 Total Submissions: 6887

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

[分析]

HASHMAP方法会EXCEED SPACE LIMIT.

因为只有4个字母,所以可以创建自己的hashkey, 每两个BITS, 对应一个 incoming character. 超过20bit 即10个字符时, 只保留20bits.

[注意]

1. (hash<<2) + map.get(c) 符号优先级, << 一定要括起来.

public class Solution {    public List<String> findRepeatedDnaSequences(String s) {        List<String> res = new ArrayList<String>();        if(s==null || s.length() < 11) return res;        int hash = 0;                Map<Character, Integer> map = new HashMap<Character, Integer>();        map.put('A', 0);        map.put('C', 1);        map.put('G', 2);        map.put('T', 3);                Set<Integer> set = new HashSet<Integer>();        Set<Integer> unique = new HashSet<Integer>();                for(int i=0; i<s.length(); i++) {            char c = s.charAt(i);            if(i<9) {                hash = (hash<<2) + map.get(c);            } else {                hash = (hash<<2) + map.get(c);                hash &= (1<<20) - 1;                if( set.contains(hash) && !unique.contains(hash)) {                    res.add(s.substring(i-9, i+1));                    unique.add(hash);                } else {                    set.add(hash);                }            }        }        return res;    }}

用python写的时候超时了，估计是使用list.count()去判断不太合理吧。。

0 0