Repeated DNA Sequences

来源:互联网 发布:淘宝给差评被报复 编辑:程序博客网 时间:2024/05/24 03:15

Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].
Java代码:

package com.leetcode;import java.util.ArrayList;import java.util.HashMap;import java.util.HashSet;import java.util.List;import java.util.Map;import java.util.Set;public class Test_73 { private static final Map<Character, Integer> A = new HashMap<>();    static { A.put('A',0); A.put('C',1); A.put('G',2); A.put('T',3); }    private final static int A_SIZE_POW_9 = (int) Math.pow(A.size(), 9);    //注意到DNA中只有四种符号,可做编码操作    /*     * 例如:AAACCC编码之后为000111将四进制数转化为十进制数:4^0*1 + 4^1*1 +4^2*1=21     * */    public static List<String> findRepeatedDnaSequences(String s) {        Set<String> res = new HashSet<>();        Set<Integer> hashes = new HashSet<>();        for (int i = 0, rhash = 0; i < s.length(); i++) {            if (i > 9) rhash -= A_SIZE_POW_9 * A.get(s.charAt(i-10));            rhash = A.size() * rhash + A.get(s.charAt(i));            if (i > 8 && !hashes.add(rhash)) res.add(s.substring(i-9,i+1));        }        return new ArrayList<>(res);    }public static void main(String[] arqs) {String test = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT";List<String> result = findRepeatedDnaSequences(test);for (String s : result) {System.out.println(s);}}}


0 0
原创粉丝点击