leetcode(212). Word Search II

来源:互联网 发布:人工智能贴吧 编辑:程序博客网 时间:2024/05/17 09:34

problem

Design a data structure that supports the following two operations:

void addWord(word) bool search(word) search(word) can search a literal
word or a regular expression string containing only letters a-z or ..
A . means it can represent any one letter.

For example:

addWord(“bad”) addWord(“dad”) addWord(“mad”) search(“pad”) -> false
search(“bad”) -> true search(“.ad”) -> true search(“b..”) -> true
Note: You may assume that all words are consist of lowercase letters
a-z.

solution

这个问题是Trie 前缀树的一个扩展,即在前缀树中加入一个模糊搜索,因此我们只需对Trie中的search做一点修改即可。

#超过80%def search1(word, d):    if word == '':        return '$' in d    if word[0] != '.':        if word[0] in d:            return search1(word[1:], d[word[0]])        else:            return False    else:        for v in d.values():            if v==True:            #跳过结束标志'$'                continue            if search1(word[1:], v):                return True        return Falseclass WordDictionary(object):    def __init__(self):        self.root = {}    def addWord(self, word):        p = self.root        for i in word:            p = p.setdefault(i, {})        p['$'] = True    def search(self, word):        return search1(word, self.root)

discussion

在leetcode上的提交中还看到这样一种解法,他把相同长度的单词存到一起,然后search时进行比对。这样的解法在规模较小时要强于Trie。

#超过了91%class WordDictionary(object):    def __init__(self):        """        Initialize your data structure here.        """        self.word_dict = collections.defaultdict(list)    def addWord(self, word):        """        Adds a word into the data structure.        :type word: str        :rtype: void        """        if word:            self.word_dict[len(word)].append(word)    def search(self, word):        """        Returns if the word is in the data structure. A word could contain the dot character '.' to represent any one letter.        :type word: str        :rtype: bool        """        if not word:            return False        if '.' not in word:            return word in self.word_dict[len(word)]        for v in self.word_dict[len(word)]:            for i, ch in enumerate(word):                if v[i] !=ch and ch != '.':                    break            else:                return True        return False

把defaultdict(list)修改成defaultdict(set)后超过了96%的提交。

总结

这是一个Trie树的典型应用,但是我们看到第二种解法的性能甚至超越了Trie,这是因为Trie在查找时无法对长度进行检查,所以会有一些无效的查找(把人对比两个单词的角度转化为计算机能理解的方式),而第二种解法考虑到虽然'.'可以匹配任何字符,但是两个单词长度不同它们两个肯定就不同,这个方法的缺点就是1. 在大量单词前缀相同时耗费内存,2. 同时也对相同前缀做了重复的匹配。