python 迭代器

来源：互联网发布：weiapibridge.js 编辑：程序博客网时间：2024/05/13 16:41

在 Python ⽂文档中，实现接⼝口通常被称为遵守协议。因为 "弱类型" 和 "Duck Type" 的缘故，很多

静态语⾔言中繁复的模式被悄悄抹平。

1 迭代器
迭代器协议，仅需要 __iter__() 和 next() 两个⽅方法。前者返回迭代器对象，后者依次返回数据，直
到引发 StopIteration 异常结束。
最简单的做法是⽤用内置函数 iter()，它返回常⽤用类型的迭代器包装对象。问题是，序列类型已经可
以被 for 处理，为何还要这么做？
>>> class Data(object):
... def __init__(self):
... self._data = []
...
... def add(self, x):
... self._data.append(x)
...
... def data(self):
... return iter(self._data)
>>> d = Data()
>>> d.add(1)
>>> d.add(2)
>>> d.add(3)
>>> for x in d.data(): print x
1
2
3
返回迭代器对象代替 self._data 列表，可避免对象状态被外部修改。或许你会尝试返回 tuple，但
这需要复制整个列表，浪费更多的内存。
iter() 很⽅方便，但⽆无法让迭代中途停⽌止，这需要⾃自⼰己动⼿手实现迭代器对象。在设计原则上，通常会
将迭代器从数据对象中分离出去。因为迭代器需要维持状态，且可能有多个迭代器在同时操控数
据，这些不该成为数据对象的负担，⽆无端提升了复杂度。
>>> class Data(object):
... def __init__(self, *args):
... self._data = list(args)
...
74
... def __iter__(self):
... return DataIter(self)
>>> class DataIter(object):
... def __init__(self, data):
... self._index = 0
... self._data = data._data
...
... def next(self):
... if self._index >= len(self._data): raise StopIteration()
... d = self._data[self._index]
... self._index += 1
... return d
>>> d = Data(1, 2, 3)
>>> for x in d: print x
1
2
3
Data 仅仅是数据容器，只需 __iter__ 返回迭代器对象，⽽而由 DataIter 提供 next ⽅方法。
除了 for 循环，迭代器也可以直接⽤用 next() 操控。
>>> d = Data(1, 2, 3)
>>> it = iter(d)
>>> it
<__main__.DataIter object at 0x10dafe850>
>>> next(it)
1
>>> next(it)
2
>>> next(it)
3
>>> next(it)

StopIteration

2 生成器

基于索引实现的迭代器有些丑陋，更合理的做法是⽤用 yield 返回实现了迭代器协议的 Generator 对
象。
>>> class Data(object):
75
... def __init__(self, *args):
... self._data = list(args)
...
... def __iter__(self):
... for x in self._data:
... yield x
>>> d = Data(1, 2, 3)
>>> for x in d: print x
1
2
3
编译器魔法会将包含 yield 的⽅方法 (或函数) 重新打包，使其返回 Generator 对象。这样⼀一来，就
⽆无须废⼒力⽓气维护额外的迭代器类型了。
>>> d.__iter__()
<generator object __iter__ at 0x10db01280>
>>> iter(d).next()
1
协程
yield 为何能实现这样的魔法？这涉及到协程 (coroutine) 的⼯工作原理。先看下⾯面的例⼦子。
>>> def coroutine():
... print "coroutine start..."
... result = None
... while True:
... s = yield result
... result = s.split(",")
>>> c = coroutine()?? ? ? # 函数返回协程对象。
>>> c.send(None)? ? ? ? # 使⽤用 send(None) 或 next() 启动协程。
coroutine start...
>>> c.send("a,b")? ? ? ? # 向协程发送消息，使其恢复执⾏行。
['a', 'b']
>>> c.send("c,d")
['c', 'd']
>>> c.close()? ? ? ? ? # 关闭协程，使其退出。或⽤用 c.throw() 使其引发异常。
76
>>> c.send("e,f")? ? ? ? # ⽆无法向已关闭的协程发送消息。
StopIteration
协程执⾏行流程：
• 创建协程后对象，必须使⽤用 send(None) 或 next() 启动。
• 协程在执⾏行 yield result 后让出执⾏行绪，等待消息。
• 调⽤用⽅方发送 send("a,b") 消息，协程恢复执⾏行，将接收到的数据保存到 s，执⾏行后续流程。
• 再次循环到 yeild，协程返回前⾯面的处理结果，并再次让出执⾏行绪。
• 直到关闭或被引发异常。
close() 引发协程 GeneratorExit 异常，使其正常退出。⽽而 throw() 可以引发任何类型的异常，这
需要在协程内部捕获。
虽然⽣生成器 yield 能轻松实现协程机制，但离真正意义上的⾼高并发还有不⼩小的距离。可以考虑使⽤用

成熟的第三⽅方库，比如 gevent/eventlet，或直接⽤用 greenlet。

3 模式
善用迭代器，总会有意外的惊喜。
⽣生产消费模型
利⽤用 yield 协程特性，我们⽆无需多线程就可以编写⽣生产消费模型。
>>> def consumer():
... while True:
... d = yield
... if not d: break
... print "consumer:", d
>>> c = consumer()? ? # 创建消费者
>>> c.send(None)? ? # 启动消费者
>>> c.send(1)? ? ? # ⽣生产数据，并提交给消费者。
consumer: 1
>>> c.send(2)
consumer: 2
>>> c.send(3)
consumer: 3
>>> c.send(None)? ? # ⽣生产结束，通知消费者结束。
StopIteration
77
改进回调
回调函数是实现异步操作的常⽤用⼿手法，只不过代码规模⼀一⼤大，看上去就不那么舒服了。好好的逻辑
被切分到两个函数⾥里，维护也是个问题。有了 yield，完全可以⽤用 blocking style 编写异步调⽤用。
下⾯面是 callback 版本的⽰示例，其中 Framework 调⽤用 logic，在完成某些操作或者接收到信号后，
⽤用 callback 返回异步结果。
>>> def framework(logic, callback):
... s = logic()
... print "[FX] logic: ", s
... print "[FX] do something..."
... callback("async:" + s)
>>> def logic():
... s = "mylogic"
... return s
>>> def callback(s):
... print s
>>> framework(logic, callback)
[FX] logic: mylogic
[FX] do something...
async:mylogic
看看⽤用 yield 改进的 blocking style 版本。
>>> def framework(logic):
... try:
... it = logic()
... s = next(it)
... print "[FX] logic: ", s
... print "[FX] do something"
... it.send("async:" + s)
... except StopIteration:
... pass
>>> def logic():
... s = "mylogic"
... r = yield s
... print r
>>> framework(logic)
[FX] logic: mylogic
[FX] do something
78
async:mylogic
尽管 framework 变得复杂了⼀一些，但却保持了 logic 的完整性。blocking style 样式的编码给逻

辑维护带来的好处⽆无需⾔言说。

4 宝藏
标准库 itertools 模块是不应该忽视的宝藏。
chain
连接多个迭代器。
>>> it = chain(xrange(3), "abc")
>>> list(it)
[0, 1, 2, 'a', 'b', 'c']
combinations
返回指定⻓长度的元素顺序组合序列。
>>> it = combinations("abcd", 2)
>>> list(it)
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
>>> it = combinations(xrange(4), 2)
>>> list(it)
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
combinations_with_replacement 会额外返回同⼀一元素的组合。
>>> it = combinations_with_replacement("abcd", 2)
>>> list(it)
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'b'), ('b', 'c'), ('b', 'd'),
('c', 'c'), ('c', 'd'), ('d', 'd')]
compress
按条件表过滤迭代器元素。
>>> it = compress("abcde", [1, 0, 1, 1, 0])
>>> list(it)
['a', 'c', 'd']
79
条件列表可以是任何布尔列表。
count
从起点开始，"⽆无限" 循环下去。
>>> for x in count(10, step = 2):
... print x
... if x > 17: break
10
12
14
16
18
cycle
迭代结束，再从头来过。
>>> for i, x in enumerate(cycle("abc")):
... print x
... if i > 7: break
a
b
c
a
b
c
a
b
c
dropwhile
跳过头部符合条件的元素。
>>> it = dropwhile(lambda i: i < 4, [2, 1, 4, 1, 3])
>>> list(it)
[4, 1, 3]
takewhile 则仅保留头部符合条件的元素。
>>> it = takewhile(lambda i: i < 4, [2, 1, 4, 1, 3])
>>> list(it)
80
[2, 1]
groupby
将连续出现的相同元素进⾏行分组。
>>> [list(k) for k, g in groupby('AAAABBBCCDAABBCCDD')]
[['A'], ['B'], ['C'], ['D'], ['A'], ['B'], ['C'], ['D']]
>>> [list(g) for k, g in groupby('AAAABBBCCDAABBCCDD')]
[['A', 'A', 'A', 'A'], ['B', 'B', 'B'], ['C', 'C'], ['D'], ['A', 'A'], ['B', 'B'], ['C',
'C'], ['D', 'D']]
ifilter
与内置函数 filter() 类似，仅保留符合条件的元素。
>>> it = ifilter(lambda x: x % 2, xrange(10))
>>> list(it)
[1, 3, 5, 7, 9]
ifilterfalse 正好相反，保留不符合条件的元素。
>>> it = ifilterfalse(lambda x: x % 2, xrange(10))
>>> list(it)
[0, 2, 4, 6, 8]
imap
与内置函数 map() 类似。
>>> it = imap(lambda x, y: x + y, (2,3,10), (5,2,3))
>>> list(it)
[7, 5, 13]
islice
以切⽚片的⽅方式从迭代器获取元素。
>>> it = islice(xrange(10), 3)
>>> list(it)
[0, 1, 2]
>>> it = islice(xrange(10), 3, 5)
>>> list(it)
81
[3, 4]
>>> it = islice(xrange(10), 3, 9, 2)
>>> list(it)
[3, 5, 7]
izip
与内置函数 zip() 类似，多余元素会被抛弃。
>>> it = izip("abc", [1, 2])
>>> list(it)
[('a', 1), ('b', 2)]
要保留多余元素可以⽤用 izip_longest，它提供了⼀一个补缺参数。
>>> it = izip_longest("abc", [1, 2], fillvalue = 0)
>>> list(it)
[('a', 1), ('b', 2), ('c', 0)]
permutations
与 combinations 顺序组合不同，permutations 让每个元素都从头组合⼀一遍。
>>> it = permutations("abc", 2)
>>> list(it)
[('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]
>>> it = combinations("abc", 2)
>>> list(it)
[('a', 'b'), ('a', 'c'), ('b', 'c')]
product
让每个元素都和后⾯面的迭代器完整组合⼀一遍。
>>> it = product("abc", [0, 1])
>>> list(it)
[('a', 0), ('a', 1), ('b', 0), ('b', 1), ('c', 0), ('c', 1)]
repeat
将⼀一个对象重复 n 次。
>>> it = repeat("a", 3)
82
>>> list(it)
['a', 'a', 'a']
starmap
按顺序处理每组元素。
>>> it = starmap(lambda x, y: x + y, [(1, 2), (10, 20)])
>>> list(it)
[3, 30]
tee
复制迭代器。
>>> for it in tee(xrange(5), 3):
... print list(it)
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]

0 0