Python 单线程与多线程批量下载的比较

来源：互联网发布：淘宝保健品标题怎么改编辑：程序博客网时间：2024/05/22 08:06

目前刚学习了Python，想要自己试试爬虫下载，就看了《Python核心编程》这本书，和综合了网上很多的爬虫下载的代码，所以自己来试试。BTW：我用的是python3.6.

这是单线程下载30个url：

from urllib.request import urlretrieve
import time
import random
start=time.time()
f=open('E:\Python\py\web\hh.txt','r')#打开存放URL的文件
a=f.readlines()
f.close()
for i in a:
b=random.randint(0,30)
urlretrieve(i,'%d.png'%b)
end=time.time()
print(end-start)

输出时间是：4.2432427406311035

同样的url文件，我用多线程和队列来实现：

from urllib.request import urlretrieve
import queue
import threading
import random
import time
class download(threading.Thread):
def __init__(self,que):
threading.Thread.__init__(self)
self.que=que
def run(self):
while True:
if not self.que.empty():
host=self.que.get()
a=random.randint(0,30)
urlretrieve(host,'%d.png'%a)
else:
break

def Down():
f=open('E:\Python\py\web\hh.txt','r')
a=f.readlines()
f.close()
que=queue.Queue()
threads=[]
for i in a:
que.put(i)
for i in range(20):
d=download(que)
threads.append(d)
for i in threads:
i.start()
for i in threads:
i.join()

if __name__=='__main__':
start=time.time()
Down()
end=time.time()
print(end-start)

最后输出是：3.6262073516845703 可以看出多线程是快了一点的

在最后我还想试试用线程池来试试，但发现我下载的Anaconda3上没有threadpool这个模块，就之后再试了

第一次写博客，想想都有点激动

0 0