LeetCode #187: Repeated DNA Sequences

来源:互联网 发布:服装erp软件排名 编辑:程序博客网 时间:2024/05/21 01:46

Problem Statement

(Source) All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

Solution

Naive first:

class Solution(object):    def findRepeatedDnaSequences(self, s):        """        :type s: str        :rtype: List[str]        """        if not s:            return []        n = len(s)        if n < 11:            return []        res = set()        dic = {}        for i in range(n - 9):            sub = s[i : i + 10]            if sub not in dic:                dic[sub] = 1            else:                res.add(sub)        return list(res)

Tweak it a little bit:

class Solution(object):    def findRepeatedDnaSequences(self, s):        """        :type s: str        :rtype: List[str]        """        if not s:            return []        n = len(s)        if n < 11:            return []        res = []        dic = {}        for i in range(n - 9):            sub = s[i : i + 10]            dic[sub] = dic.get(sub, 0) + 1            if dic[sub] == 2:                res.append(sub)        return res

Final solution using Bit Manipulation to save space by converting strings to integers:

class Solution(object):    def findRepeatedDnaSequences(self, s):        """        :type s: str        :rtype: List[str]        """        n = len(s)        if n <= 10:            return []        res = []        y = 0        m = {'A': 0, 'C': 1, 'G': 2, 'T': 3}        counter = dict()        for i in xrange(n):            y = (y * 4 + m[s[i]]) & 0xFFFFF            if i < 9:                continue            counter[y] = counter.get(y, 0) + 1            if counter[y] == 2:                res.append(s[i - 9 : i + 1])        return res         
0 0