【Pyhton网络爬虫】网络请求使用的urllib模块

来源：互联网发布：程序员教程 pdf 编辑：程序博客网时间：2024/06/05 14:58

python的简单，让我很是喜欢。所以在练习爬虫和接口测试的时候，使用python来帮助进行网络请求。

Python2.x中会使用的标准库有urllib、urllib2；

Python3.x中使用的就只有urllib（是urllib和urllib2的结合）；

其实还有很实用的requests第三方库，什么框架类的就不用再多说了，基础最重要。

先看一个简单的例子：

1.python3.x使用urllib.request请求网络，添加header有两种方式：

a.使用urllib.request.build_opener创建一个opener对象，使用这个对象进行header的添加或更新addheaders，在使用这个opener对象访问网址opener.open(url)。

b.使用urllib.request.Request常见一个Request对象，通过这个对象来进行add_header()来操作header，最后使用urllib.request.urlopen(req)。

2.Python2.x使用urllib和urllib2进行带有header的网络请求：

a.header使用字典类型的，可以先进行编码header = urllib.urlencode(header),然后将header数据通过 urllib2.Request(url,headers=self.headers)创建要给request对象，再通过urllib2.urlopen(request)发送请求。

3.简单介绍一下老二和老三的关系：

a.在Python2.X中使用import urllib——对应的，在Python3.X中会使用import urllib.request，urllib.error，urllib.parse。

b.在Python2.X中使用import urlparse——对应的，在Python3.X中会使用import urllib.parse。

c.在Python2.X中使用import urllib2——对应的，在Python3.X中会使用import urllib.request，urllib.error。

d.在Python2.X中使用import urllib2.urlopen——对应的，在Python3.X中会使用import urllib.request.urlopen。

e.在Python2.X中使用import urllib.urlencode——对应的，在Python3.X中会使用import urllib.parse.urlencode。

f.在Python2.X中使用import urllib.quote——对应的，在Python3.X中会使用import urllib.request.quote。

g.在Python2.X中使用import cookielib.CookieJar——对应的，在Python3.X中会使用import http.CookieJar。

h.在Python2.X中使用import urllib2.Request——对应的，在Python3.X中会使用import urllib.request.Request。

i.在Python2.X中使用import urllib.urlretrieve()——对应的，在Python3.X中会使用import urllib.request.urlretrieve。

Urlretrieve执行的过程中，会产生一些缓存，如果我们想清除这些缓存信息，可以使用urlcleanup()进行清除，输入如下代码即可清除Urlretrieve执行所造成的缓存：

上面可以简单的了解Urllib相关模块中从Python2.X到Python3.X的一些小小的变动，以方便后续的开发使用（相关的代码随后有时间贴上）。

阅读全文

0 0