LeetCode Repeated DNA Sequences

来源:互联网 发布:android编程语言 编辑:程序博客网 时间:2024/06/05 23:44

Description:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

Solution:

首先比较直接的做法,遍历+TreeMap把所有的结果存储起来,存储的是String,结果MLE了(这里不得不吐槽LeetCode,不给数据范围怎么做!?)

所以考虑降低内存的方法。

对于DNA,其实只有AGCT四种情况,完全可以做一个映射,将他们对应到0,1,2,3,也有小伙伴用的是AGCT对于int在二进制表示下的后三位。

所以每2位用一个二进制存储DNA即可,在二进制下的0,1,2,3是00,01,10,11,一共十位,最大是20个1,转换成十六进制,就是0xFFFFF。

import java.util.*;public class Solution {public List<String> findRepeatedDnaSequences(String s) {List<String> list = new ArrayList<String>();TreeMap<Integer, Integer> map = new TreeMap<Integer, Integer>();if (s.length() < 10)return list;int temp = 0, num;for (int i = 0; i < 9; i++) {temp = temp << 2 | convert(s.charAt(i));}for (int i = 9; i < s.length(); i++) {temp = (temp << 2 | convert(s.charAt(i))) & 0xFFFFF;if (map.containsKey(temp)) {num = map.get(temp);map.put(temp, num + 1);} elsemap.put(temp, 1);}String neo;Iterator<Integer> ite = map.keySet().iterator();while (ite.hasNext()) {temp = ite.next();num = map.get(temp);if (num == 1)continue;neo = "";for (int i = 0; i < 10; i++) {neo = (char) convert(temp % 4) + neo;temp >>= 2;}list.add(new String(neo));}return list;}int convert(int ch) {switch (ch) {case 'A':return 0;case 'C':return 1;case 'G':return 2;case 'T':return 3;case 0:return 'A';case 1:return 'C';case 2:return 'G';case 3:return 'T';}return 0;}}


0 0
原创粉丝点击