python多线程学习(续)

来源:互联网 发布:mac不能无线键盘 编辑:程序博客网 时间:2024/06/10 13:18

b.当要访问的网页比较多,比如1000个时,我们不能同时启动1000个线程,这样可能机器的性能抗不住,我们可以设置一个线程池,只启动40个线程,等40个线程执行完了,再启动其他的线程。
比如,有10个线程,我设置的线程池数目是4.

#!/usr/bin/python env# -*- coding:utf-8 -*-import threadingimport urllib2import  timedef surf_net(url):    start_time = time.time()    print 'surf start', start_time    req = urllib2.Request(url)    time.sleep(2)    try:        urllib2.urlopen(req)    except urllib2.URLError as e:        print e.reason    end_time = time.time()    print url, urllib2.urlopen(req).code, end_time - start_timeurl_list = ['https://www.taobao.com', 'https://www.baidu.com', 'https://www.jd.com',            'http://mail.163.com', 'http://www.csdn.net', 'http://www.weibo.com',            'http://www.youku.com','http://www.dianping.com/', 'https://mp.weixin.qq.com', 'https://xiumi.us/#/']#####################one by one step start###################one_by_one_start = time.time()for each_url in url_list:    # print each_url    surf_net(each_url)one_by_one_end = time.time()print 'one by one run time is:', one_by_one_end - one_by_one_start#####################one by one step end###################threads = []start_time = time.time()for index in range(len(url_list)):    one_thread = threading.Thread(target=surf_net, args=(url_list[index],))    threads.append(one_thread)thread_num = 4 #set threading pool, you have put 4 threads in itwhile 1:    count = min(thread_num, len(threads))    print 'count', count   ###4,4,2    res = []    for index in range(count):        x = threads.pop()        res.append(x)    for thread_index in res:        thread_index.start()    for j in res:        j.join()    if not threads:        breakend_time = time.time()print 'start time to end time =', end_time - start_time

由于选择的10个网站返回数据都很快,为了对单线程和多线程进行对比,访问每个网页时我们都让程序睡2秒。

运行结果如下:

surf start 1501241536.07https://www.taobao.com 200 5.18160605431surf start 1501241541.41https://www.baidu.com 200 2.17890906334surf start 1501241543.74https://www.jd.com 200 2.21625900269surf start 1501241546.14http://mail.163.com 200 2.08968806267surf start 1501241548.3http://www.csdn.net 200 2.08174395561surf start 1501241550.45http://www.weibo.com 200 2.21271705627surf start 1501241552.81http://www.youku.com 200 2.1059448719surf start 1501241555.0http://www.dianping.com/ 200 12.4947040081surf start 1501241577.89https://mp.weixin.qq.com 200 2.1225938797surf start 1501241580.13https://xiumi.us/#/ 200 2.07485389709one by one run time is: 46.2059190273count 4surf start 1501241582.28surf start surf start 1501241582.281501241582.28surf start 1501241582.28http://www.youku.com https://mp.weixin.qq.com 200 7.10403490067https://xiumi.us/#/ 200 7.15073108673200 7.22556805611http://www.dianping.com/ 200 12.6222820282count 4surf start 1501241605.45surf startsurf start 1501241605.45 1501241605.45surf start 1501241605.45http://mail.163.com http://www.csdn.net http://www.weibo.com 200 2.07268214226200 2.08485388756https://www.jd.com 200 2.08593082428200 2.19985890388count 2surf start 1501241607.84 surf start 1501241607.84https://www.baidu.com 200 2.16068816185https://www.taobao.com 200 7.46481204033start time to end time = 38.1950109005

单个线程执行是46秒,但是若是多线程执行是38秒。若是将线程池改大一些,若设置线程池为5个,则运行结果为:

surf start 1501241800.73https://www.taobao.com 200 5.46020698547surf start 1501241806.33https://www.baidu.com 200 2.166918993surf start 1501241808.65https://www.jd.com 200 3.22009301186surf start 1501241812.06http://mail.163.com 200 2.10153102875surf start 1501241814.22http://www.csdn.net 200 2.10018992424surf start 1501241816.39http://www.weibo.com 200 7.5566380024surf start 1501241826.47http://www.youku.com 200 2.12182092667surf start 1501241828.67http://www.dianping.com/ 200 12.511051178surf start 1501241851.53https://mp.weixin.qq.com 200 2.22085404396surf start 1501241853.87https://xiumi.us/#/ 200 2.12279200554one by one run time is: 55.3288900852count 5surf start 1501241856.06surf start 1501241856.06surf start 1501241856.06surf start 1501241856.06 surf start 1501241856.06https://xiumi.us/#/ http://www.youku.com https://mp.weixin.qq.com 200 7.07921099663200 7.08860993385200 7.13851284981http://www.weibo.com http://www.dianping.com/ 200 7.34806513786200 12.405025959count 5surf start surf start 1501241878.871501241878.87surf start 1501241878.87surf start 1501241878.87surf start 1501241878.87http://www.csdn.net http://mail.163.com https://www.baidu.com https://www.jd.com 200 2.14060592651200 2.10476207733200 2.20087099075200 2.21591711044https://www.taobao.com 200 5.41400504112start time to end time = 28.6687788963

单个线程运行时,时间是58秒,线程池为5个线程并发时,运行时间是28.7秒。

若是线程池设置为10,即thread_num = 10时

/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/nfzhlkzn/Documents/StudyCode/StudyThreading/demo5.pysurf start 1501242133.5https://www.taobao.com 200 2.2023191452surf start 1501242135.87https://www.baidu.com 200 2.17822694778surf start 1501242138.2https://www.jd.com 200 2.20729494095surf start 1501242140.59http://mail.163.com 200 2.11021089554surf start 1501242142.75http://www.csdn.net 200 2.09399604797surf start 1501242144.93http://www.weibo.com 200 2.21243000031surf start 1501242147.31http://www.youku.com 200 2.10412812233surf start 1501242149.49http://www.dianping.com/ 200 12.4396719933surf start 1501242172.45https://mp.weixin.qq.com 200 2.11350488663surf start 1501242174.66https://xiumi.us/#/ 200 2.11214208603one by one run time is: 44.2924759388count 10surf start 1501242177.79surf start 1501242177.8surf start 1501242177.8surf start 1501242177.8surf start 1501242177.8 surf start 1501242177.8surf start 1501242177.8surf start 1501242177.8surf start 1501242177.8surf start 1501242177.8http://www.youku.com https://mp.weixin.qq.com https://xiumi.us/#/ http://mail.163.com http://www.csdn.net https://www.baidu.com 200 2.43743491173200 7.09744596481200 7.10696387291200 2.10279512405http://www.weibo.com 200 2.14740610123https://www.taobao.com https://www.jd.com 200 10.1884939671200 7.18714308739200 10.1404249668200 10.2309341431http://www.dianping.com/ 200 15.3515269756start time to end time = 25.8884620667Process finished with exit code 0

单个线程运行时,需要44.3秒,10个并发线程时,运行时间为25.9秒

分析:由于网络不稳定,每次运行时,单个线程访问10个网站的总运行时间都是不固定的,但是在每次运行中,都可以看到,多进程比单进程运行时间少很多。而且不是线程池开的越大,运行时间减少就会显著哦。会有个最优的线程池设置。这个比较复杂,不在本次的讨论范围内。

结论:对于IO密集型的场景,pyhton的多线程可以提高运行效率。

原创粉丝点击