5. 数据结构

来源：互联网发布：caffe训练googlenet 编辑：程序博客网时间：2024/05/22 00:37

5. 数据结构

本章详细讲述一些你已经学过的知识，并补充一些新内容。

5.1. 列表的更多特性

列表数据类型还有更多的方法。这里是列表对象方法的清单：

list.append(x): 添加一个元素到列表的末尾。相当于 a[len(a):] = [x].

list.extend(L): 将给定列表L中的所有元素附加到原列表a的末尾。相当于 a[len(a):] = L.

list.insert(i, x): 在给定位置插入一个元素。第一个参数为被插入元素的位置索引，因此 a.insert(0, x) 在列表头插入值， a.insert(len(a), x)相当于 a.append(x).

list.remove(x): 删除列表中第一个值为 x 的元素。如果没有这样的元素将会报错。

list.pop([i]): 删除列表中给定位置的元素并返回它。如果没有给定位置，a.pop()将会删除并返回列表中的最后一个元素。（i 两边的方括号表示这个参数是可选的，而不是要你输入方括号。你会在 Python 参考库中经常看到这种表示法)。

list.clear(): 删除列表中所有的元素。相当于 del a[:].

list.index(x): 返回列表中第一个值为 x 的元素的索引。如果没有这样的元素将会报错。

list.count(x): 返回列表中 x 出现的次数。

list.sort(key=None, reverse=False): 排序列表中的项 (参数可被自定义, 参看 sorted() ).

list.reverse(): 列表中的元素按位置反转。

list.copy(): 返回列表的一个浅拷贝。相当于 a[:].

列表方法示例：

>>>>>> a = [66.25, 333, 333, 1, 1234.5]>>> print(a.count(333), a.count(66.25), a.count('x'))2 1 0>>> a.insert(2, -1)>>> a.append(333)>>> a[66.25, 333, -1, 333, 1, 1234.5, 333]>>> a.index(333)1>>> a.remove(333)>>> a[66.25, -1, 333, 1, 1234.5, 333]>>> a.reverse()>>> a[333, 1234.5, 1, 333, -1, 66.25]>>> a.sort()>>> a[-1, 1, 66.25, 333, 333, 1234.5]>>> a.pop()1234.5>>> a[-1, 1, 66.25, 333, 333]

你可能已经注意以下方法 insert、 remove 或 sort 只修改列表且没有可打印的返回值 — — 他们返回默认 None。[1] 这是Python里可变数据结构的设计原则。

5.1.1. 列表作为栈使用

列表方法使得将列表当作堆栈非常容易，最先进入的元素最后一个取出（后进先出）。使用 append()添加项到栈顶。使用无参的 pop() 从栈顶检出项。例如：

>>>>>> stack = [3, 4, 5]>>> stack.append(6)>>> stack.append(7)>>> stack[3, 4, 5, 6, 7]>>> stack.pop()7>>> stack[3, 4, 5, 6]>>> stack.pop()6>>> stack.pop()5>>> stack[3, 4]

5.1.2. 列表作为队列使用

列表也有可能被用来作队列——先添加的元素被最先取出 (“先进先出”)；然而列表用作这个目的相当低效。因为在列表的末尾添加和弹出元素非常快，但是在列表的开头插入或弹出元素却很慢 (因为所有的其他元素必须向左移一位)。

若要实现一个队列， collections.deque 被设计用于快速地从两端操作。例如：

>>>>>> from collections import deque>>> queue = deque(["Eric", "John", "Michael"])>>> queue.append("Terry")           # Terry arrives>>> queue.append("Graham")          # Graham arrives>>> queue.popleft()                 # The first to arrive now leaves'Eric'>>> queue.popleft()                 # The second to arrive now leaves'John'>>> queue                           # Remaining queue in order of arrivaldeque(['Michael', 'Terry', 'Graham'])

5.1.3. 列表的解析生成式

列表解析提供了一个生成列表的简洁方法。应用程序通常会从一个序列的每个元素的操作结果生成新的列表，或者生成满足特定条件的元素的子序列。

例如，假设我们想要创建一个列表 squares：

>>>>>> squares = []>>> for x in range(10):...     squares.append(x**2)...>>> squares[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

注意这个创建（或者说重写）的变量 x 在循环结束的时候仍然存在。使用如下方法，我们可以计算squares的值而不会产生任何的副作用：

squares = list(map(lambda x: x**2, range(10)))

或者，等价地:

squares = [x**2 for x in range(10)]

上面这个方法更加简明且易读.

A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it. 例如，下面的 listcomp 组合两个列表中不相等的元素：

>>>>>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y][(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

它等效于：

>>>>>> combs = []>>> for x in [1,2,3]:...     for y in [3,1,4]:...         if x != y:...             combs.append((x, y))...>>> combs[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

Note how the order of the for and if statements is the same in both these snippets.

If the expression is a tuple (e.g. the (x, y) in the previous example), it must be parenthesized.

>>>>>> vec = [-4, -2, 0, 2, 4]>>> # create a new list with the values doubled>>> [x*2 for x in vec][-8, -4, 0, 4, 8]>>> # filter the list to exclude negative numbers>>> [x for x in vec if x >= 0][0, 2, 4]>>> # apply a function to all the elements>>> [abs(x) for x in vec][4, 2, 0, 2, 4]>>> # call a method on each element>>> freshfruit = ['  banana', '  loganberry ', 'passion fruit  ']>>> [weapon.strip() for weapon in freshfruit]['banana', 'loganberry', 'passion fruit']>>> # create a list of 2-tuples like (number, square)>>> [(x, x**2) for x in range(6)][(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]>>> # the tuple must be parenthesized, otherwise an error is raised>>> [x, x**2 for x in range(6)]  File "<stdin>", line 1, in ?    [x, x**2 for x in range(6)]               ^SyntaxError: invalid syntax>>> # flatten a list using a listcomp with two 'for'>>> vec = [[1,2,3], [4,5,6], [7,8,9]]>>> [num for elem in vec for num in elem][1, 2, 3, 4, 5, 6, 7, 8, 9]

列表解析可以包含复杂的表达式和嵌套的函数：

>>>>>> from math import pi>>> [str(round(pi, i)) for i in range(1, 6)]['3.1', '3.14', '3.142', '3.1416', '3.14159']

5.1.4. 嵌套的列表的解析生成式

列表解析中的第一个表达式可以是任何表达式，包括列表解析。

考虑下面由三个长度为 4 的列表组成的 3x4 矩阵：

>>>>>> matrix = [...     [1, 2, 3, 4],...     [5, 6, 7, 8],...     [9, 10, 11, 12],... ]

下面的列表解析将转置行和列：

>>>>>> [[row[i] for row in matrix] for i in range(4)][[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

正如我们所见，在前一节中, the nested listcomp is evaluated in the context of the for that follows it, 所以这个例子等效于

>>>>>> transposed = []>>> for i in range(4):...     transposed.append([row[i] for row in matrix])...>>> transposed[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

以此下去，还等同于：

>>>>>> transposed = []>>> for i in range(4):...     # the following 3 lines implement the nested listcomp...     transposed_row = []...     for row in matrix:...         transposed_row.append(row[i])...     transposed.append(transposed_row)...>>> transposed[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

在实际中，与复杂的控制流比起来，你应该更喜欢内置的函数。The zip() function would do a great job for this use case:

>>>>>> list(zip(*matrix))[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]

See Unpacking Argument Lists for details on the asterisk in this line.

5.2. `del` 语句

这里有一个通过直接给出列表元素的索引号（不给出值）即可删除列表元素的方法: del 语句.这跟pop() 方法不同，后者会返回一个值.del 语句也可以用于从列表中删除片段或清除整个列表(先前我们已经通过将一个空列表赋值给这个片段来达到此目的).例如：

>>>>>> a = [-1, 1, 66.25, 333, 333, 1234.5]>>> del a[0]>>> a[1, 66.25, 333, 333, 1234.5]>>> del a[2:4]>>> a[1, 66.25, 1234.5]>>> del a[:]>>> a[]

del 也可以用于删除整个变量︰

>>>>>> del a

如果再次对变量 a 进行引用将引起错误 (至少在对变量a再次赋值前).在后文中我们将会发现 del 语句还有其它的用途.

5.3. 元组和序列

我们已经看到列表和字符串具有很多共同的属性，如索引和切片操作。They are two examples of sequence data types (see Sequence Types — list, tuple, range). 因为 Python 是一个正在不断进化的语言，其他的序列类型也可能被添加进来。还有另一种标准序列数据类型：元组。

元组由逗号分割的若干值组成，例如：

>>>>>> t = 12345, 54321, 'hello!'>>> t[0]12345>>> t(12345, 54321, 'hello!')>>> # Tuples may be nested:... u = t, (1, 2, 3, 4, 5)>>> u((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))>>> # Tuples are immutable:... t[0] = 88888Traceback (most recent call last):  File "<stdin>", line 1, in <module>TypeError: 'tuple' object does not support item assignment>>> # but they can contain mutable objects:... v = ([1, 2, 3], [3, 2, 1])>>> v([1, 2, 3], [3, 2, 1])

正如你所见, on output tuples are always enclosed in parentheses, so that nested tuples are interpreted correctly; they may be input with or without surrounding parentheses, although often parentheses are necessary anyway (if the tuple is part of a larger expression).不能给元组中单独的一个元素赋值，不过可以创建包含可变对象（例如列表）的元组。

虽然元组看起来类似于列表，它们经常用于不同的场景和不同的目的。Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking (see later in this section) or indexing (or even by attribute in the case of namedtuples). Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list.

一个特殊的情况是构造包含 0 个或 1 个元素的元组：为了实现这种情况，语法上有一些奇怪。Empty tuples are constructed by an empty pair of parentheses; a tuple with one item is constructed by following a value with a comma (it is not sufficient to enclose a single value in parentheses). 丑陋，但是有效。例如：

>>>>>> empty = ()>>> singleton = 'hello',    # <-- note trailing comma>>> len(empty)0>>> len(singleton)1>>> singleton('hello',)

The statement t = 12345, 54321, 'hello!' is an example of tuple packing: the values 12345, 54321 and 'hello!' are packed together in a tuple. 其逆操作也是可以的：

>>>>>> x, y, z = t

这被称为 序列分拆 再恰当不过了，且可以用于右边的任何序列。序列分拆要求等号左侧的变量和序列中的元素的数目相同。注意多重赋值只是同时进行元组封装和序列分拆。

5.4. Sets

Python 还包含了一个数据类型集合。集合中的元素不会重复且没有顺序。集合的基本用途包括成员测试和消除重复条目。集合对象还支持并集、交集、差和对称差等数学运算。

Curly braces or the set() function can be used to create sets. Note: to create an empty set you have to use set(), not {}; the latter creates an empty dictionary, a data structure that we discuss in the next section.

这里是一个简短的演示：

>>>>>> basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}>>> print(basket)                      # show that duplicates have been removed{'orange', 'banana', 'pear', 'apple'}>>> 'orange' in basket                 # fast membership testingTrue>>> 'crabgrass' in basketFalse>>> # Demonstrate set operations on unique letters from two words...>>> a = set('abracadabra')>>> b = set('alacazam')>>> a                                  # unique letters in a{'a', 'r', 'b', 'c', 'd'}>>> a - b                              # letters in a but not in b{'r', 'd', 'b'}>>> a | b                              # letters in either a or b{'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'}>>> a & b                              # letters in both a and b{'a', 'c'}>>> a ^ b                              # letters in a or b but not both{'r', 'd', 'b', 'm', 'z', 'l'}

Similarly to list comprehensions, set comprehensions are also supported:

>>>>>> a = {x for x in 'abracadabra' if x not in 'abc'}>>> a{'r', 'd'}

5.5. 字典

另一个有用的python内置数据类型是字典 (参见 Mapping Types — dict)有时候你会发现字典在其它语言中被称为 “associative memories” 或者 “associative arrays”。与由一系列数字索引的序列不同，字典是依据键索引的，键可以是任意不可变的类型；字符串和数字常量总是能作为键。元组可以用作键，如果他们只包含字符串、数字或元组;如果一个元组直接或间接地包含任何可变对象，它不能用作主键。You can’t use lists as keys, since lists can be modified in place using index assignments, slice assignments, or methods like append() and extend().

理解字典的最佳方式是把它看做无序的 键:值 对集合，要求是键必须是唯一的（在同一个字典内）。A pair of braces creates an empty dictionary: {}. Placing a comma-separated list of key:value pairs within the braces adds initial key:value pairs to the dictionary; this is also the way dictionaries are written on output.

字典的主要操作是依据键来存取值。It is also possible to delete a key:value pair with del. 如果你用一个已经存在的键存储值，那么以前为该键分配的值就会被覆盖。从一个不存在的键中读取值会导致错误。

Performing list(d.keys()) on a dictionary returns a list of all the keys used in the dictionary, in arbitrary order (if you want it sorted, just use sorted(d.keys()) instead). [2] To check whether a single key is in the dictionary, use the in keyword.

下面是一个使用字典的小示例：

>>>>>> tel = {'jack': 4098, 'sape': 4139}>>> tel['guido'] = 4127>>> tel{'sape': 4139, 'guido': 4127, 'jack': 4098}>>> tel['jack']4098>>> del tel['sape']>>> tel['irv'] = 4127>>> tel{'guido': 4127, 'irv': 4127, 'jack': 4098}>>> list(tel.keys())['irv', 'guido', 'jack']>>> sorted(tel.keys())['guido', 'irv', 'jack']>>> 'guido' in telTrue>>> 'jack' not in telFalse

The dict() constructor builds dictionaries directly from sequences of key-value pairs:

>>>>>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)]){'sape': 4139, 'jack': 4098, 'guido': 4127}

此外，字典解析可以用于从任意键和值表达式创建字典：

>>>>>> {x: x**2 for x in (2, 4, 6)}{2: 4, 4: 16, 6: 36}

当键都是简单的字符串时，通过关键字参数指定键-值对有时会更为方便：

>>>>>> dict(sape=4139, guido=4127, jack=4098){'sape': 4139, 'jack': 4098, 'guido': 4127}

5.6. Looping Techniques

When looping through dictionaries, the key and corresponding value can be retrieved at the same time using the items() method.

>>>>>> knights = {'gallahad': 'the pure', 'robin': 'the brave'}>>> for k, v in knights.items():...     print(k, v)...gallahad the purerobin the brave

When looping through a sequence, the position index and corresponding value can be retrieved at the same time using the enumerate() function.

>>>>>> for i, v in enumerate(['tic', 'tac', 'toe']):...     print(i, v)...0 tic1 tac2 toe

To loop over two or more sequences at the same time, the entries can be paired with the zip() function.

>>>>>> questions = ['name', 'quest', 'favorite color']>>> answers = ['lancelot', 'the holy grail', 'blue']>>> for q, a in zip(questions, answers):...     print('What is your {0}?  It is {1}.'.format(q, a))...What is your name?  It is lancelot.What is your quest?  It is the holy grail.What is your favorite color?  It is blue.

To loop over a sequence in reverse, first specify the sequence in a forward direction and then call the reversed() function.

>>>>>> for i in reversed(range(1, 10, 2)):...     print(i)...97531

To loop over a sequence in sorted order, use the sorted() function which returns a new sorted list while leaving the source unaltered.

>>>>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']>>> for f in sorted(set(basket)):...     print(f)...applebananaorangepear

如果在遍历列表的时候同时想改变它，创建一个新的列表会更简单更安全。

>>>>>> import math>>> raw_data = [56.2, float('NaN'), 51.7, 55.3, 52.5, float('NaN'), 47.8]>>> filtered_data = []>>> for value in raw_data:...     if not math.isnan(value):...         filtered_data.append(value)...>>> filtered_data[56.2, 51.7, 55.3, 52.5, 47.8]

5.7. More on Conditions

The conditions used in while and if statements can contain any operators, not just comparisons.

The comparison operators in and not in check whether a value occurs (does not occur) in a sequence. The operators is and is not compare whether two objects are really the same object; this only matters for mutable objects like lists. 所有比较运算符都具有相同的优先级，低于所有数值运算符。

可以级联比较。For example, a < b == c tests whether a is less than b and moreover b equals c.

Comparisons may be combined using the Boolean operators and and or, and the outcome of a comparison (or of any other Boolean expression) may be negated with not.These have lower priorities than comparison operators; between them, not has the highest priority and or the lowest, so that A and not B or C is equivalent to (A and (notB)) or C. As always, parentheses can be used to express the desired composition.

The Boolean operators and and or are so-called short-circuit operators: their arguments are evaluated from left to right, and evaluation stops as soon as the outcome is determined. For example, if A and C are true but B is false, A and B and C does not evaluate the expression C. When used as a general value and not as a Boolean, the return value of a short-circuit operator is the last evaluated argument.

可以把比较或其它逻辑表达式的返回值赋给一个变量。例如，

>>>>>> string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'>>> non_null = string1 or string2 or string3>>> non_null'Trondheim'

注意 Python 与 C 不同，在表达式内部不能赋值。C programmers may grumble about this, but it avoids a common class of problems encountered in C programs: typing = in an expression when == was intended.

5.8. Comparing Sequences and Other Types

序列对象可以与具有相同序列类型的其他对象相比较。The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted. 如果要比较的两个元素本身就是同一类型的序列，就按字典序递归比较。如果两个序列的所有元素都相等，就认为序列相等。如果一个序列是另一个序列的初始子序列，较短的序列就小于另一个。Lexicographical ordering for strings uses the Unicode code point number to order individual characters. 下面是同类型序列之间比较的一些例子：

(1, 2, 3)              < (1, 2, 4)[1, 2, 3]              < [1, 2, 4]'ABC' < 'C' < 'Pascal' < 'Python'(1, 2, 3, 4)           < (1, 2, 4)(1, 2)                 < (1, 2, -1)(1, 2, 3)             == (1.0, 2.0, 3.0)(1, 2, ('aa', 'ab'))   < (1, 2, ('abc', 'a'), 4)

Note that comparing objects of different types with < or > is legal provided that the objects have appropriate comparison methods. 例如，不同的数字类型按照它们的数值比较，所以 0 等于 0.0，等等。Otherwise, rather than providing an arbitrary ordering, the interpreter will raise a TypeError exception.

脚注

[1]Other languages may return the mutated object, which allows method chaining, such as d->insert("a")->remove("b")->sort();.[2]Calling d.keys() will return a dictionary view object. It supports operations like membership test and iteration, but its contents are not independent of the original dictionary – it is only a view.

0 0

5. 数据结构

5. 数据结构

5.1. 列表的更多特性

5.1.1. 列表作为栈使用

5.1.2. 列表作为队列使用

5.1.3. 列表的解析生成式

5.1.4. 嵌套的列表的解析生成式

5.2. del 语句

5.3. 元组和序列

5.4. Sets

5.5. 字典

5.6. Looping Techniques

5.7. More on Conditions

5.8. Comparing Sequences and Other Types

5.2. `del` 语句