三、集合数据类型Collection Data Types

来源：互联网发布：京东大数据平台架构编辑：程序博客网时间：2024/05/16 10:52

一、序列类型Sequence Types

Python提供了5中内置的序列类型，分别是bytearray, bytes, list, str, and tuple，其中前两者会在第7章文件处理时会用到，其他序列类型由标准库提供，例如collections.namedtuple。这一节主要介绍tuples, named tuples, and lists。

1、元组Tuples

与string类似，元组不可修改，如若想修改元组，利用list函数使其转换成list数据类型。tuple()函数返回空元组。

——Shallow and deep copying

t.count(x) 函数返回t元组中x对象出现的次数

t.index(x) 函数返回t元组中x对象第一次出现的索引位置（如果没有引发ValueError异常）

示例：

>>> hair = "black","brown", "blonde", "red"

>>> hair[:2], "gray",hair[2:]

(('black', 'brown'), 'gray', ('blonde','red'))

>>> hair[:2] + ("gray",)+ hair[2:] #返回包含所有项的单个元组（concatenate tuples）

('black', 'brown', 'gray', 'blonde', 'red')

示例（本书的编程风格就是这种，在二元运算符的左边、一元运算符的右边不加括号）：

a, b = (1, 2) # left of binary operator

del a, b # right of unary statement

def f(x):

return x, x ** 2 # right ofunary statement

for x, y in ((1, 1), (2, 4), (3, 9)): # left of binary operator

print(x, y)

示例（嵌套元组）：

>>> things = (1, -7.5,("pea", (5, "Xyz"), "queue"))

>>> things[2][1][1][2]

'z'

嵌套元组中的item数据类型可以是任意类型，嵌套太深容易让人迷惑，可以使用这种办法：

>>> MANUFACTURER, MODEL, SEATING =(0, 1, 2)

>>> MINIMUM, MAXIMUM = (0, 1)

>>> aircraft =("Airbus", "A320-200", (100, 220))

>>> aircraft[SEATING][MAXIMUM]

220

2、Named Tuples

Python对象可以替代Named Tuples

3、List 列表

与string、tuple不同，list是可变的，我们可以再列表上进行插入、替换、删除操作。此外列表可以被嵌套、迭代、切片，与tuple相同。

Table 3.1. List Methods

Syntax

Description

L.append(x)

Appends item x to the end of list L

L.count(x)

Returns the number of times item x occurs in list L

L.extend(m)

L += m

Appends all of iterable m's items to the end of list L; the operator += does the same thing

L.index(x, start, end)

Returns the index position of the leftmost occurrence of item x in list L (or in the start:end slice of L); otherwise, raises a ValueError exception

L.insert(i, x)

Inserts item x into list L at index position int i

L.pop()

Returns and removes the rightmost item of list L

L.pop(i)

Returns and removes the item at index position int i in L

L.remove(x)

Removes the leftmost occurrence of item x from list L, or raises a ValueError exception if x is not found

L.reverse()

Reverses list L in-place

L.sort(...)

Sorts list L in-place; this method accepts the same key and reverse optional arguments as the built-in sorted()

——unpacking operator

>>> first, *rest = [9, 2, -4, 8,7]

>>> first, rest

(9, [2, -4, 8, 7])

>>> first, *mid, last ="Charles Philip Arthur George Windsor".split()

>>> first, mid, last

('Charles', ['Philip', 'Arthur', 'George'],'Windsor')

>>> *directories, executable = "/usr/local/bin/gvim".split("/")

>>> directories, executable

(['', 'usr', 'local', 'bin'], 'gvim')

——增加项

woods= ["Cedar", "Yew", "Fir"]，表中两种操作的结果是一样的：

woods += ["Kauri", "Larch"]

woods.extend(["Kauri", "Larch"])

woods =['Cedar', 'Yew', 'Fir', 'Kauri', 'Larch']

——修改项

——删除项

4、List Comprehensions

***

二、集合类型Set Types

集合支持成员操作符in，size()函数，还支持set.isdisjoint()函数、比较函数和位运算符（适用于并集和交集的计算），Python提供两个内置的set类型，可变的set类型和不可变的frozenset类型。

只有hashable对象被加入集合中，Hashable对象拥有__hash__()特别方法和__sq__()方法。

内置的可变数据类型：float, frozenset, int, str, and tuple是hashable的，所以可以加入set，与此同时内置的不可变数据类型：dict, list不可以加入set。

——Sets

Set是可以改变的，可以添加和删除元素，但是其内部无序，所以不能根据索引访问元素

S = {7, "veil", 0, -29,("x", 11), "sun", frozenset({8, 4, 7}), 913}，注意是花括号

Table 3.2. Set Methods and Operators

Syntax

Description

s.add(x)

Adds item x to set s if it is not already in s

s.clear()

Removes all the items from set s

s.copy()

Returns a shallow copy of set s

s.difference(t) s - t

Returns a new set that has every item that is in set s that is not in set t

s.difference_update(t) s -= t

Removes every item that is in set t from set s

s.discard(x)

Removes item x from set s if it is in s; see also set.remove()

s.intersection(t) s & t

Returns a new set that has each item that is in both set s and set t

s.intersection_update(t) s &= t

Makes set s contain the intersection of itself and set t

s.isdisjoint(t)

Returns TRue if sets s and t have no items in common

s.issubset(t) s <= t

Returns true if set s is equal to or a subset of set t; use s < t to test whether s is a proper subset of t

s.issuperset(t) s >= t

Returns true if set s is equal to or a superset of set t; use s > t to test whether s is a proper superset of t

s.pop()

Returns and removes a random item from set s, or raises a KeyError exception if s is empty

s.remove(x)

Removes item x from set s, or raises a KeyError exception if x is not in s; see also set.discard()

s.symmetric_difference(t) s ^ t

Returns a new set that has every item that is in set s and every item that is in set t, but excluding items that are in both sets

s.symmetric_difference_update(t) s ^= t

Makes set s contain the symmetric difference of itself and set t

s.union(t) s | t

Returns a new set that has all the items in set s and all the items in set t that are not in set s

s.update(t) s |= t

Adds every item in set t that is not in set s, to set s

This method and its operator (if it has one) can also be used with frozensets.

Set的一种常见的用途是快速的成员测试：

if len(sys.argv) == 1 or sys.argv[1] in{"-h", "--help"}:

另一种常见用于确保不处理重复的数据：

for ip in set(ips):

process_ip(ip)

另一种常见的用途是除掉不想要的项

filenames = set(filenames)

for makefile in {"MAKEFILE","Makefile", "makefile"}:

filenames.discard(makefile)

与之等价的语句：filenames = set(filenames) - {"MAKEFILE","Makefile", "makefile"}

——Set Comprehensions

{expression for item in iterable}

{expression for item in iterable ifcondition}

三、映射类型Mapping Types

Python提供了两种映射类型，内置的字典类型dict和标准库的collections.defaultdict。只有哈希对象可以作为字典的键，所以不可变的数据类型如float，frozenset，int，str和tuple可以作为字典的键，但是可变类型，如字典，列表和set不能。

Dictionaries字典

生成字典的语法示例：

l d1 = dict({"id": 1948, "name":"Washer", "size": 3})

l d2 = dict(id=1948, name="Washer", size=3)

l d3 = dict([("id", 1948), ("name","Washer"), ("size", 3)])

l d4 = dict(zip(("id", "name", "size"),(1948, "Washer", 3)))

l d5 = {"id": 1948, "name": "Washer","size": 3}

Table3.3. Dictionary Methods

Syntax

Description

d.clear()

Removes all items from dict d

d.copy()

Returns a shallow copy of dict d

d.fromkeys(s, v)

Returns a dict whose keys are the items in sequence s and whose values are None or v if v is given

d.get(k)

Returns key k's associated value, or None if k isn't in dict d

d.get(k, v)

Returns key k's associated value, or v if k isn't in dict d

d.items()

Returns a view[*] of all the (key, value) pairs in dict d

d.keys()

Returns a view[*] of all the keys in dict d

d.pop(k)

Returns key k's associated value and removes the item whose key is k, or raises a KeyError exception if k isn't in d

d.pop(k, v)

Returns key k's associated value and removes the item whose key is k, or returns v if k isn't in dict d

d.popitem()

Returns and removes an arbitrary (key, value) pair from dict d, or raises a KeyError exception if d is empty

d.setdefault(k, v)

The same as the dict.get() method, except that if the key is not in dict d, a new item is inserted with the key k, and with a value of None or of v if v is given

d.update(a)

Adds every (key, value) pair from a that isn't in dict d to d, and for every key that is in both d and a, replaces the corresponding value in d with the one in a—a can be a dictionary, an iterable of (key, value) pairs, or keyword arguments

d.values()

Returns a view[*] of all the values in dict d

遍历字典：

for item in d.items():

print(item[0], item[1])

for key, value in d.items():

print(key, value)

Dictionary Comprehensions

Default Dictionaries

Default dictionaries与字典（Plain Dictionaries）有相同的操作符和方法，唯一不同的是它们键缺失的处理方式。比较下表两个代码段的不同：

words是Plain Dictionarie

words[word] = words.get(word, 0) + 1

words是Default dictionaries

words = collections.defaultdict(int)

words[word] += 1

四、迭代和拷贝集合Iterating and Copying Collections

——迭代器、可迭代操作和函数（Iterators and Iterable Operations and Functions）

iterable data type（可迭代数据类型），有__iter__()方法，可提供迭代器；

Iterator是迭代器提供__next__()method，迭代结束引发StopIteration exception

Table3.4. Common Iterable Operators and Functions

Syntax

Description

s + t

Returns a sequence that is the concatenation of sequences s and t

s * n

Returns a sequence that is int n concatenations of sequence s

x in i

Returns TRue if item x is in iterable i; use not in to reverse the test

all(i)

Returns true if every item in iterable i evaluates to true

any(i)

Returns true if any item in iterable i evaluates to TRue

enumerate(i, start)

Normally used in for ... in loops to provide a sequence of (index, item) tuples with indexes starting at 0 or start; see text

len(x)

Returns the "length" of x. If x is a collection it is the number of items; if x is a string it is the number of characters.

max(i, key)

Returns the biggest item in iterable i or the item with the biggest key(item) value if a key function is given

min(i, key)

Returns the smallest item in iterable i or the item with the smallest key(item) value if a key function is given

range(start, stop, step)

Returns an integer iterator. With one argument (stop), the iterator goes from 0 to stop - 1; with two arguments (start, stop) the iterator goes from start to stop - 1; with three arguments it goes from start to stop - 1 in steps of step.

reversed(i)

Returns an iterator that returns the items from iterator i in reverse order

sorted(i, key, reverse)

Returns a list of the items from iterator i in sorted order; key is used to provide DSU (Decorate, Sort, Undecorate) sorting. If reverse is TRue the sorting is done in reverse order.

sum(i, start)

Returns the sum of the items in iterable i plus start (which defaults to 0); i may not contain strings

zip(i1, ..., iN)

Returns an iterator of tuples using the iterators i1 to iN; see text

当使用for item in iterable循环语句时，Python内部实际上调用iter(iterable)获得一个迭代器：

product = 1

for i in [1, 2, 4, 8]:

product *= i

print(product) # prints: 64

product = 1

i = iter([1, 2, 4, 8])

while True:

try:

product *= next(i)

except StopIteration:

break

print(product) # prints: 64

——enumerate()函数的用法：

参数时迭代器，返回enumerator对象，该对象本身也可以是迭代器，每一次迭代返回一个2-tuple，元组中第一项是iteration number（默认从0开始），并且the second item the next item from the iterator enumerate() wascalled on。

if len(sys.argv) < 3:

print("usage: grepword.py word infile1 [infile2 [... infileN]]")

sys.exit()

word = sys.argv[1]

for filename in sys.argv[2:]:

for lino, line in enumerate(open(filename), start=1):

if word in line:

print("{0}:{1}:{2:.40}".format(filename, lino,

line.rstrip()))

unpack an iterable对可迭代对象的“解引用”操作有* 和range，示例如下（calculate是接受4个参数的函数）：

calculate(1, 2, 3, 4)

t = (1, 2, 3, 4)

calculate(*t)

calculate(*range(1, 5))

——sorted函数和reversed函数

另外两个和迭代相关的函数，sorted函数返回一个拷贝，reversed函数返回一个逆向迭代器

>>> list(range(6))

[0, 1, 2, 3, 4, 5]

>>> list(reversed(range(6)))

[5, 4, 3, 2, 1, 0]

其中sorted()函数的用法更复杂一些，该函数应用的示例有：

>>> x = []

>>> for t in zip(range(-10, 0, 1), range(0, 10, 2), range(1, 10, 2)):

... x += t

>>> x

[-10, 0, 1, -9, 2, 3, -8, 4, 5, -7, 6, 7, -6, 8, 9]

>>> sorted(x)

[-10, -9, -8, -7, -6, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> sorted(x, reverse=True)

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -6, -7, -8, -9, -10]

>>> sorted(x, key=abs)

[0, 1, 2, 3, 4, 5, 6, -6, -7, 7, -8, 8, -9, 9, -10]

两段代码在功能上是等价的：

x = sorted(x, key=str.lower)

temp = []

for item in x:

temp.append((item.lower(), item))

x = []

for key, value in sorted(temp):

x.append(value)

Python提供的排序算法是自适应的稳定的归并排序算法（adaptive stable mergesort），Python排序是用的是”<”，集合内部嵌套集合，Python的排序算法同样要给排序。

——Copying Collections

浅拷贝

深拷贝

浅拷贝初始：

>>> songs = ["Because", "Boys", "Carol"]

>>> beatles = songs

>>> beatles, songs

(['Because', 'Boys', 'Carol'], ['Because', 'Boys', 'Carol'])

>>> beatles[2] = "Cayenne"

>>> beatles, songs

(['Because', 'Boys', 'Cayenne'], ['Because', 'Boys', 'Cayenne'])

>>> x = [53, 68, ["A", "B", "C"]]

>>> y = x[:] # shallow copy

>>> x, y

([53, 68, ['A', 'B', 'C']], [53, 68, ['A', 'B', 'C']])

>>> y[1] = 40

>>> x[2][0] = 'Q'

>>> x, y

([53, 68, ['Q', 'B', 'C']], [53, 40, ['Q', 'B', 'C']])

与之对比

>>> import copy

>>> x = [53, 68, ["A", "B", "C"]]

>>> y = copy.deepcopy(x)

>>> y[1] = 40

>>> x[2][0] = 'Q'

>>> x, y

([53, 68, ['Q', 'B', 'C']], [53, 40, ['A', 'B', 'C']])

浅拷贝进一步：

对于字典dict和集合而言

dict.copy() and set.copy()

copy模块的copy()方法同样返回对象的一份拷贝

另一种办法就是，对于内置类型的拷贝，可以把其为参数传递给类型同名函数，示例：

copy_of_dict_d = dict(d)

copy_of_list_L = list(L)

copy_of_set_s = set(s)

0 0