python 多线程编程总结(实验多线程判断网址是否在线)

来源:互联网 发布:js替换所有换行符 编辑:程序博客网 时间:2024/05/19 16:49

现在做一个针对网址是否在线的判断实验,利用多线程和普通方法来进行对比,以下为代码和代码结果:

一,不使用多线程,代码如下:

        #encoding:utf-8

import threading

import urllib2


def online(url = ''):

  """判断网址是否在线"""

  req = urllib2.Request(url)

  try:

    response=urllib2.urlopen(req)

    if response.code == 200:

      print response.geturl(),' this url is online'

    else:

      print 'not'

  except urllib2.URLError as e:

    if hasattr(e, 'reason'):

      print url,' We failed to reach a server.'

      print 'Reason: ', e.reason

    elif hasattr(e, 'code'):

      print url,' The server couldn\'t fulfill the request.'

      print 'Error code: ', e.code


def main():

  url_list = ['http://www.baidu.com','http://www.hitwh.edu.cn','http://www.13.com','http://www.ifeng.com','http://www.sina.com',

        'http://www.wewin.com.gr/2','http://www.ifeng.com','http://www.sina.com','http://www.zeeif.com/int/',

        'http://www.zeeif.com/websc/verification/',

        'http://mjgds.org/classrooms/wp-content/plugins/10421312312/19890907.html',

        'http://login-resolution-center-case-475ec2aec1br.propesage-algerie.com/ID',

        'http://paypel-login-resolution-center.propesage-algerie.com/ID/',

        'http://radiotransilvania.ro/clujarena/rena.php',

        'http://kuleteknik.net/wp-includes/lol3.html',

        'http://kuleteknik.net/wp-includes/lol2.html'

        ]


  for url in url_list:

    #t = threading.Thread(target = online,args = (url,))

    #t.start()

    online(url)

if __name__ == '__main__':

  main()

结果如下:

http://www.baidu.com  this url is online

http://www.hitwh.edu.cn  this url is online

http://www.13.com  We failed to reach a server.

Reason:  [Errno 11001] getaddrinfo failed

http://www.ifeng.com  this url is online

http://www.sina.com.cn/  this url is online

http://www.wewin.com.gr/2  We failed to reach a server.

Reason:  Unauthorized

http://www.ifeng.com  this url is online

http://www.sina.com.cn/  this url is online

http://www.zeeif.com/int/  We failed to reach a server.

Reason:  Not Found

http://www.zeeif.com/websc/verification/  We failed to reach a server.

Reason:  Not Found

http://mjgds.org/classrooms/wp-content/plugins/10421312312/19890907.html  We failed to reach a server.

Reason:  Internal Server Error

http://login-resolution-center-case-475ec2aec1br.propesage-algerie.com/ID  this url is online

http://paypel-login-resolution-center.propesage-algerie.com/ID/  this url is online

http://radiotransilvania.ro/clujarena/rena.php  We failed to reach a server.

Reason:  Not Found

http://kuleteknik.net/wp-includes/lol3.html  this url is online

http://kuleteknik.net/wp-includes/lol2.html  this url is online

[Finished in 5.2s]

解释:使用了5.2秒,若判断网址更多,并且其中没有在线的网址更多时,时间会更长

二、使用多线程判断,代码如下:

#encoding:utf-8

import threading

import urllib2


def online(url = ''):

  """判断网址是否在线"""

  req = urllib2.Request(url)

  try:

    response=urllib2.urlopen(req)

    if response.code == 200:

      print response.geturl(),' this url is online'

    else:

      print 'not'

  except urllib2.URLError as e:

    if hasattr(e, 'reason'):

      print url,' We failed to reach a server.'

      print 'Reason: ', e.reason

    elif hasattr(e, 'code'):

      print url,' The server couldn\'t fulfill the request.'

      print 'Error code: ', e.code


def main():

  url_list = ['http://www.baidu.com','http://www.hitwh.edu.cn','http://www.13.com','http://www.ifeng.com','http://www.sina.com',

        'http://www.wewin.com.gr/2','http://www.ifeng.com','http://www.sina.com','http://www.zeeif.com/int/',

        'http://www.zeeif.com/websc/verification/',

        'http://mjgds.org/classrooms/wp-content/plugins/10421312312/19890907.html',

        'http://login-resolution-center-case-475ec2aec1br.propesage-algerie.com/ID',

        'http://paypel-login-resolution-center.propesage-algerie.com/ID/',

        'http://radiotransilvania.ro/clujarena/rena.php',

        'http://kuleteknik.net/wp-includes/lol3.html',

        'http://kuleteknik.net/wp-includes/lol2.html'

        ]


  for url in url_list:

    t = threading.Thread(target = online,args = (url,))

    t.start()

    #online(url)


if __name__ == '__main__':

  main()

结果如下:

http://www.baidu.com  this url is online

http://www.ifeng.com  this url is online

http://www.13.com  We failed to reach a server.

Reason:  [Errno 11001] getaddrinfo failed

http://www.ifeng.com  this url is online

http://paypel-login-resolution-center.propesage-algerie.com/ID/  this url is online

http://www.hitwh.edu.cn  this url is online

http://login-resolution-center-case-475ec2aec1br.propesage-algerie.com/ID  this url is online

http://www.sina.com.cn/  this url is online

http://www.sina.com.cn/  this url is online

http://mjgds.org/classrooms/wp-content/plugins/10421312312/19890907.html  We failed to reach a server.

Reason:  Internal Server Error

http://www.zeeif.com/websc/verification/  We failed to reach a server.

Reason:  Not Found

http://www.zeeif.com/int/  We failed to reach a server.

Reason:  Not Found

http://kuleteknik.net/wp-includes/lol2.html  this url is online

http://kuleteknik.net/wp-includes/lol3.html  this url is online

http://www.wewin.com.gr/2  We failed to reach a server.

Reason:  Unauthorized

http://radiotransilvania.ro/clujarena/rena.php  We failed to reach a server.

Reason:  Not Found

[Finished in 1.7s]

解释:每一个网址判断都使用一个线程执行,时间只用了1.7s

总结:

1、当判断的网址多时,数量级达到百万级,多线程的优势会显现的非常大。

2、该多线程代码是为每一个网址创建一个线程,当网址过多时,很显然这个方法不行,所以可以优化该判断代码。

3、当网址存在数据库中时候,如何高效存入数据库,也是很重要的方法。

4、上面判断网址是否在线的函数,个人觉得不是非常正确,因为网址重定向的问题,网址可能不存在,但是重定向后,显示网址还存在,这也是以后改进方法,有改进办法的同学可以跟我留言,共同进步,如果我有方法,也会在博客公开。

更新(2014.10.30)

1、使用pycurl检测url是否在线,效率更高。

2、将其连接数据库,并且将结果存入数据库(自己做的小项目,已经完成)

0 0
原创粉丝点击