数据结构之Trie

来源:互联网 发布:php的错误日志在哪里 编辑:程序博客网 时间:2024/04/28 02:57

 This article is about a tree data structure. For the French commune, see Trie-sur-Baïse.


In computer science, a trie, also called digital tree and sometimesradix tree orprefix tree (as they can be searched by prefixes), is an orderedtree data structure that is used to store a dynamic set or associative array where the keys are usually strings.

在计算机科学中,一个Trie树,又称数字树有时基数树或前缀树(他们可以根据前缀来查找),是一个有序的树型数据结构,用于存储一组动态或关联数组的键通常是字符串。

Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a commonprefix of the string associated with that node, and the root is associated with theempty string. Values are not necessarily associated with every node. Rather, values tend only to be associated with leaves, and with some inner nodes that correspond to keys of interest. For the space-optimized presentation of prefix tree, seecompact prefix tree.

和一个二叉搜索树不一样,树中没有节点来存储与该节点相关联的键(key);相反,它在树中的位置定义了它与该节点关联的键。节点的所有的后代都有与该节点相关联的字符串的共同前缀,并且根与空字符串关联。值不一定与每一个节点关联。相反,值往往只与叶,和一些内部节点关联键。


A trie for keys "A","to", "tea", "ted", "ten", "i", "in", and "inn".

In the example shown, keys are listed in the nodes and values below them. Each complete English word has an arbitrary integer value associated with it. A trie can be seen, obviously, as a tree-shapeddeterministic finite automaton. Each finite language is generated by a trie automaton, and each trie can be compressed into adeterministic acyclic finite state automaton.

在所示的示例中,键在下面的节点和值中列出。每个完整的英语单词都有一个与它相关的任意整数。树可以看出,很明显,作为一个树型有限状态自动机。每一个有限的语言是由一个Trie树自动机生成的,每棵树可以压缩成一个确定性有限状态自动机循环。


Though tries are usually keyed by character strings, they need not be. The same algorithms can be adapted to serve similar functions of ordered lists of any construct, e.g. permutations on a list of digits or shapes. In particular, abitwise trie is keyed on the individual bits making up any fixed-length binary datum, such as an integer or memory address.

虽然通常尝试是键的字符串,但可以不用这样。相同的算法可以被调整,以提供功能类似的任何结构的有序列表,例如排列在一个数字或形状的列表。特别是,一个比特位结构使各个位编制固定长度的二进制数据,如整数或内存地址。

Applications 应用

As a replacement for other data structures 作为其他数据结构的替代

As discussed below, a trie has a number of advantages over binary search trees.[6] A trie can also be used to replace a hash table, over which it has the following advantages:

以下讨论,trie在二进制搜索树的一些优点。[ 6 ]树也可以用来代替一个哈希表,它具有以下优点:

  • Looking up data in a trie is faster in the worst case, O(m) time (where m is the length of a search string), compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table isO(N) time, but far more typically is O(1), with O(m) time spent evaluating the hash.
  • 一个Trie在最坏的情况下更快,O(M)的时间(其中M是一个搜索字符串的长度),而一个不完美的哈希表。一个不完善的哈希表可以有密钥冲突。一个键冲突是在哈希表中的同一位置的不同键的哈希函数映射。在一个不完善的哈希表的最坏情况下的查找速度是O(n)的时间,但更典型的是O(1),用O(M)的时间花费评估的散列。
  • There are no collisions of different keys in a trie.
  • 没有不同的键碰撞的可能。
  • Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value.
  • 在一个Trie中,类似于哈希表存储,键的碰撞,是必要的只有单一的键是有一个以上的值相关联的。
  • There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
  • 没有必要提供一个哈希函数散列函数或改变更多的键添加到树。
  • A trie can provide an alphabetical ordering of the entries by key.
  • 树可以通过键提供一个按字母顺序的条目排序。

Tries do have some drawbacks as well:

Tries也有一些缺点:

  • Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random-access time is high compared to main memory.[7]
  • 在某些情况下比查找数据的哈希表慢一些,特别是如果数据直接访问硬盘驱动器或其他一些二级存储设备,随机存取时间比主存储器高。
  • Some keys, such as floating point numbers, can lead to long chains and prefixes that are not particularly meaningful. Nevertheless, a bitwise trie can handle standard IEEE single and double format floating point numbers.
  • 一些关键问题,如浮点数,可以导致长链和前缀,不是特别有意义。然而,一位树可以处理标准IEEE单双格式的浮点数。
  • Some tries can require more space than a hash table, as memory may be allocated for each character in the search string, rather than a single chunk of memory for the whole entry, as in most hash tables.
  • 一些Trie可以比哈希表需要更多的空间,因为在搜索字符串中的每个字符都可以分配内存,而不是在大多数哈希表的整个条目的单个内存块

Dictionary representation 字典树的表示

A common application of a trie is storing a predictive text or autocomplete dictionary, such as found on a mobile telephone. Such applications take advantage of a trie's ability to quickly search for, insert, and delete entries; however, if storing dictionary words is all that is required (i.e., storage of information auxiliary to each word is not required), a minimal deterministic acyclic finite state automaton (DAFSA) would use less space than a trie. This is because a DAFSA can compress identical branches from the trie which correspond to the same suffixes (or parts) of different words being stored.

一个常见的应用是存储Trie预测文本或完成的词典,如移动电话上的发现。这样的应用程序利用一个Trie树的能力,快速搜索,插入,删除的条目;然而,如果储存字典中的单词的所有要求(即,对每个字的辅助信息存储不需要),最小的确定性无环有限状态自动机(dafsa)将使用更少的空间比树。这是因为dafsa可以压缩相同的分支,对应相同的后缀树(或部分)被存储不同的词。

Tries are also well suited for implementing approximate matching algorithms,[8] including those used inspell checking and hyphenation[4] software.

Tries也非常适合实现近似匹配算法[ 8 ],包括那些用于拼写检查和断字[ 4 ]软件。

Algorithms

Lookup and membership are easily described. The listing below implements a recursive trie node as aHaskell data type. It stores an optional value and a list of children tries, indexed by the next character:

查找和隶属度很容易描述。下面的列表实现递归树节点作为一个Haskell语言的数据类型。它存储一个可选的值和一个列表的儿童尝试,由下一个字符索引:

import Data.Map data Trie a = Trie { value    :: Maybe a,                     children :: Map Char (Trie a) }

We can look up a value in the trie as follows:

我们可以期待在Trie值如下:

find :: String -> Trie a -> Maybe afind []     t = value tfind (k:ks) t = do  ct <- Data.Map.lookup k (children t)  find ks ct

In an imperative style, and assuming an appropriate data type in place, we can describe the same algorithm inPython (here, specifically for testing membership). Note that children is a list of a node's children; and we say that a "terminal" node is one which contains a valid word.

在必要的风格中,承担适当的数据类型,我们可以在Python中描述相同的算法。请注意,孩子是一个节点的孩子的列表,我们说,一个“终端”节点是一个包含一个有效的字。

def find(node, key):    for char in key:        if char in node.children:            node = node.children[char]        else:            return None    return node

Insertion proceeds by walking the trie according to the string to be inserted, then appending new nodes for the suffix of the string that is not contained in the trie. In imperative pseudocode,

插入通过遍历树根据字符串被插入,然后添加新节点的字符串不包含在Trie的后缀。在命令式的伪代码,

algorithm insert(root : node, s : string, value : any):    node = root    i    = 0    n    = length(s)    while i < n:        if node.child(s[i]) != nil:            node = node.child(s[i])            i = i + 1        else:            break    (* append new nodes, if necessary *)    while i < n:        node.child(s[i]) = new node        node = node.child(s[i])        i = i + 1    node.value = value

Sorting 排序

Lexicographic sorting of a set of keys can be accomplished with a simple trie-based algorithm as follows:

字典树一套key排序 可以用如下简单的trie算法实现:

  • Insert all keys in a trie.在一个树的插入键值。
  • Output all keys in the trie by means of pre-order traversal, which results in output that is in lexicographically increasing order. Pre-order traversal is a kind of depth-first traversal.输出所有的key在Trie树的前序遍历的方法,以字典递增的顺序产生输出。预序遍历是一种深度优先遍历。

This algorithm is a form of radix sort.该算法是一种基数排序的算法。

A trie forms the fundamental data structure of Burstsort, which (in 2007) was the fastest known string sorting algorithm.[10] However, now there are faster string sorting algorithms.一个结构形式的burstsort基础数据结构,其中(2007)是已知最快的字符串排序算法。[ 10 ]然而,现在有更快的字符串排序算法。[11]


Full text search 全文检索;

A special kind of trie, called a suffix tree, can be used to index all suffixes in a text in order to carry out fast full text searches.

一种特殊的结构,称为后缀树,可以用索引为文本进行快速的全文搜索所有后缀。



1 0
原创粉丝点击