Python学习笔记11

来源：互联网发布：cpa考试顺序推荐知乎编辑：程序博客网时间：2024/05/22 00:48

chapter_11

迭代器和解析,第一部分

常用迭代工具

for循环
列表解析
in 成员测试
map 内置函数

文件迭代器

>>> f = open('script1.py')>>> f.readline()'import sys\n'>>> f.readline()'print(sys.path)\n'>>> f.readline()'x = 2\n'>>> f.readline()'print(2 ** 33)\n'>>> f.readline()''

文件也有一个方法,名为 __next__,差不多有相同的效果:每次调用时,就会返回文件中的下一行,唯一值得注意的区别在于,到达文件末尾时, __next__ 会引发内置Stopiteration异常,而不是返回空字符串

>>> f = open('script1.py')>>> f.__next__()'import sys\n'>>> f.__next__()'print(sys.path)\n'>>> f.__next__()'x = 2\n'>>> f.__next__()'print(2 ** 33)\n'>>> f.__next__()Traceback (most recent call last):  File "<stdin>", line 1, in <module>StopIteration

迭代协议: 有 __next__ 方法的对象会前进到下一个结果,而在一系列结果的末尾时,则会引发 Stopiteration.任何符合以上描述的对象都认为是可迭代的.

任何可迭代对象能以for循环或其他迭代工具遍历,因为所有迭代工具内部工作起来都是在每次迭代中调用__next__,并且捕捉Stopiteration异常来确定何时离开.

>>> for line in open('script1.py'): #调用文件迭代器...   print(line.upper(),end='')... IMPORT SYSPRINT(SYS.PATH)X = 2PRINT(2 ** 33)

#readlines方法也可用,但不是最佳的做法,当打开大文件时,会很慢并耗费大量的内存>>> for line in open('script1.py').readlines(): ...   print(line.upper(),end='')... IMPORT SYSPRINT(SYS.PATH)X = 2PRINT(2 ** 33)

#用while循环逐行读取文件,比文件迭代器慢一些>>> f = open('script1.py')>>> while True:...   line = f.readline()...   if not line: break...   print(line.upper(),end='')... IMPORT SYSPRINT(SYS.PATH)X = 2PRINT(2 ** 33)

手动迭代: iter和next

next方法:

为了支持手动迭代代码(用较少的录入),Python 3.0 还提供了一个内置函数next,它会自动调用一个对象的__next__方法.给定一个可迭代对象X,调用next(X)等同于X.__next__( ), 但前者简单很多.

>>> f = open('script1.py')>>> f.__next__()'import sys\n'>>> f.__next__()'print(sys.path)\n'>>> f = open('script1.py')>>> next(f)'import sys\n'>>> next(f)'print(sys.path)\n'

从技术的角都来讲,当for循环开始时,会通过它传给iter内置函数,以便从可迭代对象中获得一个迭代器,返回的对象含有需要的next方法.

iter方法:

>>> L = [1,2,3]>>> I = iter(L) #构建一个迭代器>>> I.__next__() #调用迭代器的__next__方法1>>> I.__next__()2>>> I.__next__()3>>> I.__next__()Traceback (most recent call last):  File "<stdin>", line 1, in <module>StopIteration

注意

文件对象就是自己的迭代器,也就是说,文件有自己的__next__方法,调用iter方法后返回的也是自己.

>>> f = open('script1.py')>>> iter(f) is f #调用iter方法后返回的也是自己True>>> f.__next__()'import sys\n'

列表以及很多其他的内置对象,不是自身的迭代器,因为它们支持多次用iter方法创建迭代器.

> L = [1,2,3]>>> iter(L) is L #调用iter方法后,创建了一个迭代器False>>> L.__next__() #列表自身是可迭代对象,不是迭代器,不具有__next__()方法调用Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'list' object has no attribute '__next__'>>> >>> I = iter(L) #创建了一个迭代器>>> I.__next__() 1>>> next(I)2

自动迭代和手动迭代

自动迭代

>>> L = [1,2,3]>>> for X in L:...   print(X ** 2,end=' ')... 1 4 9

手动迭代

>>> L = [1,2,3]>>> I = iter(L)>>> while True:...   try:...      X = next(I) ...   except StopIteration:...      break...   print(X ** 2,end=' ')... 1 4 9

字典的迭代器

传统的遍历字典方法

>>> D = {'a':1,'b':2,'c':3}>>> for key in D.keys():...   print(key,D[key])... c 3b 2a 1

最近的Python版本中,字典有一个迭代器,在迭代环境中,会自动一次返回一个键

>>> I = iter(D) #构建字典迭代器>>> next(I)'c'>>> next(I)'b'>>> next(I)'a'>>> next(I)Traceback (most recent call last):  File "<stdin>", line 1, in <module>StopIteration

因为在for循环中使用到的就是迭代协议(构造迭代器,然后调用__next__( )方法逐个取值),我们可以直接用for循环来遍历字典

>>> D{'c': 3, 'b': 2, 'a': 1}>>> for key in D:...   print(key,D[key])... c 3b 2a 1

列表解析

遍历一个列表的两种方式

>>> L = [1,2,3,4,5]>>> for i in range(len(L)):...   L[i] += 10... >>> L[11, 12, 13, 14, 15]

>> L = [x + 10 for x in L]>>> L[21, 22, 23, 24, 25]

列表解析比手动的for循环语句运行的更快(往往速度会快一倍),因为它们的迭代在解析器内部是以C语言的速度执行的,而不是以手动Python代码执行的,特别是对于较大的数据集合,这是使用列表解析的一个主要的性能优点.

在文件上使用列表解析

文件对象有一个readlines方法,它能一次性地把文件载入到行字符串中的一个列表中:

>>> f = open('script1.py')>>> lines = f.readlines()>>> lines['import sys\n', 'print(sys.path)\n', 'x = 2\n', 'print(2 ** 33)\n']

移除每一行后面的换行符

>>> lines = [line.rstrip() for line in lines]>>> lines['import sys', 'print(sys.path)', 'x = 2', 'print(2 ** 33)']

一次性解决

>>> lines = [line.rstrip() for line in open('script1.py')]>>> lines['import sys', 'print(sys.path)', 'x = 2', 'print(2 ** 33)']

以上的解析表达式是一种高效的解析文件方式,因为大多数工作在Python解析器内部完成,这比等价的语句要快很多,特别是对于较大的文件,列表解析的速度优势可能很显著.

扩展列表解析语法

表达式中嵌套的for循环可以有一个相关的if语句,用来过滤那些测试不为真的结果

>>> lines = [line.rstrip() for line in open('script1.py') if line[0] == 'p'] #筛选出文件中以p开头的行>>> lines['print(sys.path)', 'print(2 ** 33)']

等价的语句

>>> res = []>>> for line in open('script1.py'):...   if line[0] == 'p':...      res.append(line.rstrip())... >>> res['print(sys.path)', 'print(2 ** 33)']

解析语句的嵌套

>>> [x + y for x in 'abc' for y in 'lmn']['al', 'am', 'an', 'bl', 'bm', 'bn', 'cl', 'cm', 'cn']

等价的语句

>>> res = []>>> for x in 'abc':...   for y in 'lmn':...     res.append(x+y)... >>> res['al', 'am', 'an', 'bl', 'bm', 'bn', 'cl', 'cm', 'cn']

其他迭代方法

>>> map(str.upper,open('script1.py'))<map object at 0xb743848c>>>> list(map(str.upper,open('script1.py')))['IMPORT SYS\n', 'PRINT(SYS.PATH)\n', 'X = 2\n', 'PRINT(2 ** 33)\n']

>>> sorted(open('script1.py')) #对文件进行迭代排序['import sys\n', 'print(2 ** 33)\n', 'print(sys.path)\n', 'x = 2\n']

>>> list(zip(open('script1.py'),open('script1.py')))[('import sys\n', 'import sys\n'), ('print(sys.path)\n', 'print(sys.path)\n'), ('x = 2\n', 'x = 2\n'), ('print(2 ** 33)\n', 'print(2 ** 33)\n')]

>>> list(enumerate(open('script1.py')))[(0, 'import sys\n'), (1, 'print(sys.path)\n'), (2, 'x = 2\n'), (3, 'print(2 ** 33)\n')]

filter内置函数,对于传入的函数返回True的可迭代对象中的每一项,它都会返回该项.

>>> list(filter(bool,open('script1.py'))) #选择函数为真的项['import sys\n', 'print(sys.path)\n', 'x = 2\n', 'print(2 ** 33)\n']>>> filter(bool,['spam','','ni']) #<filter object at 0xb743858c>>>> list(filter(bool,['spam','','ni']))['spam', 'ni']

>>> import functools,operator>>> operator.add('a','b') #将两个对象连接起来'ab'>>> functools.reduce(operator.add,open('script1.py'))'import sys\nprint(sys.path)\nx = 2\nprint(2 ** 33)\n'

zip,enumerate和filter也像map一样返回一个可迭代对象.
迭代器在遍历其结果一次后,就用尽了.不能在同一个迭代器上拥有保持不同位置的多个迭代器.

>>> M = map(abs,(-1,0,1)) #构建一个迭代器>>> M<map object at 0xb74386cc>>>> next(M)1>>> next(M)0>>> next(M)1>>> next(M)Traceback (most recent call last):  File "<stdin>", line 1, in <module>StopIteration>>> >>> for x in M: print(x) #迭代器已经被遍历过了,再遍历也没有结果,需要重新构建一个迭代器...>>> M = map(abs,(-1,0,1)) #重新构建一个迭代器>>> for x in M: print(x)... 101

单个迭代器和多个迭代器

range支持多个迭代器

#支持多个迭代器>>> R = range(3) >>> next(R) #range不是迭代器Traceback (most recent call last):  File "<stdin>", line 1, in <module>TypeError: 'range' object is not an iterator>>> I1 = iter(R) #构造迭代器1>>> next(I1) 0>>> next(I1)1>>> I2 = iter(R) #构造迭代器2>>> next(I2)0>>> next(I1) 2

zip,map和filter不支持多个迭代器

>>> Z = zip((1,2,3),(10,11,12))>>> I1 = iter(Z) #试图开两个迭代器,结果只是一个引用>>> I2 = iter(Z)>>> next(I1)(1, 10)>>> next(I1)(2, 11)>>> next(I2)(3, 12)

>>> M = map(abs,(-1,0,1))>>> I1 = iter(M);I2 = iter(M)>>> print(next(I1),next(I1),next(I1))1 0 1>>> print(next(I2))Traceback (most recent call last):  File "<stdin>", line 1, in <module>StopIteration

字典迭代器

>>> D = dict(a=1,b=2,c=3)>>> D{'c': 3, 'b': 2, 'a': 1}>>> >>> K = D.keys()>>> Kdict_keys(['c', 'b', 'a'])>>> next(K)Traceback (most recent call last):  File "<stdin>", line 1, in <module>TypeError: 'dict_keys' object is not an iterator>>> I = iter(K) #构造字典迭代器,迭代产生字典的键>>> next(I)'c'>>> next(I)'b'>>> for k in D.keys():print(k,end=' ')... c b a

0 0