【转】关于python cookielib，urllib2,httplib 模块(HTTP状态码)

来源：互联网发布：php parent父类编辑：程序博客网时间：2024/06/13 00:43

关于cookielib,urllib2,httplib 模块(注：以下部分信息摘自《python参考手册》)

Cookielib
cookielib 模块中定义了一些类来自动处理HTTP请求中的cookieCookieJar()对象：

CookieJar 存储HTTP请求生成的cookie，并向传出HTTP请求中添加cookie，整个cookie都存在内存中。

FileCookieJar()对象 (LWPCookieJar(xx)对象) 当需要时，可使用FileCookieJar.load(xx)从文件里载入cookie

更多关于cookielib模块的知识，可查寻手册。

那么在HTTP请求时，如何加入COOKIE处理呐？我们一歩一歩来。

urllib2
(这里有一点要说明一下：urllib,urllib2,urlparse,robotparse在python3中，全放于urllib包中)

一般简单请求：

import urllib2

print(urllib2.urlopen('http://www.baidu.com').read())

我们使用 urllib2.urlopen()就可以了。

当需要POST数据，或者说设置Header信息时，连接代理服务器，就需要使用Request()方法了。

header = {'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
req = urllib2.Request(
     url= 'http://www.xx.com'
     data = postdata,
     headers = header
    )
r = urllib2.urlopen(req)

注意这里postdata是用urllib.urlencode()处理过的。实际这里的postdata可以是字典形式，也可以是这样匹配的字符串“k1=v1&k2=v2......”，只不过当以字典形式传入时，得使用urlencode()处理成"k1=v1&k2=v2......"形式。

Request实例除了直接使用参数外，还有如下方法：

add_data(data):data也是url编码过的，注意此方法不会将data追加到之前设置的数据上

add_header(key,val):添加报头信息，如：add_header('User-Agent','Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2')

set_proxy(host,type):设置连接代理服务器,host为主机，type为请求类型。

等等。

当然，以上设置请求仍然不能支持诸如COOKIE等HTTP高级功能，要支持这些功能，需要使用

bulid_opener()函数创建自己的opener对象

bulid_opener([handler1[,handler2,....])

传入的handler1,handler2等这些参数都是处理HTTP特殊功能的程序对象实例，并且有如下处理程序：

默认情况下：ProxyHandler,UnknowHandler,HTTPHandler,HTTPSHandler,HTTPDefaultErrorHandler

HTTPRedirectHandler,FTPHandler,FileHandler,HTTPErrorProcessoropener这些处理程序是可用的。

另外需要注意的是：bulid_opener()返回的对象是具有open(url[,data[,timeout]])方法，其作用是根据各种

处理程序提供的规则打开URL，当然我们可以使用install_opener(opener)来安装不同的opener对象作为urlopen()

使用的全局URL opener。

例如使用代理访问网络：

proxy = ProxyHandler({'http','http://www.xxx.com:8080/'}) #ProxyHandler() 接受一个字典参数，将协议名(如 http,ftp等)映射到相应的代理服务器上。

auth = HTTPBasicAuthHandler() #需要认证handler

auth.add_password('realm','host','user','pwd')

opener = build_opener(proxy,auth)

例如使用COOKIE：

#coding:utf-8
import urllib,urllib2,cookielib

#创建Opener----------------------------
#创建cookie对象
cookie = cookielib.CookieJar()
#创建COOKIE处理程序
cookieProc = urllib2.HTTPCookieProcessor(cookie)
#创建opener
opener = urllib2.build_opener(cookieProc)
#安装到urlopen()(这里也可以不用install_opener)
urllib2.install_opener(opener)

#发起请求------------------------------
#设置请求参数
postdata = {
    'username':'xxx@163.com',
    'password':'xxxxxx',
    'type':1
    }
postdata = urllib.urlencode(postdata)

#注：这里还可以这样写：

# 直接把 postdata = 'username=xxx@163.com&password=xxxx&type=1'

#设置请求header
header = {'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
#Request
req = urllib2.Request(
     url = 'http://reg.163.com/logins.jsp?type=1&product=mail163&url=http://entry.mail.163.com/coremail/fcg/ntesdoor2?lightweight%3D1%26verifycookie%3D1%26language%3D-1%26style%3D1',
     data = postdata,
     headers = header
    )
#请求
res = urllib2.urlopen(req).read()

#如果上面代码没有install_opener()，则这里可用:opener.open(req).read()方法来获取内容,实例请看：《通用的登陆Discuz!论坛 python代码》

print(res)

另关于urllib类

s = 'http://jk.chengdu.cn/中国 fsdf fsdf%^&'
#编码
print(urllib.quote(s))#quote类似PHP的urlencode
#结果：http%3A//jk.chengdu.cn/%E4%B8%AD%E5%9B%BD%20fsdf%20fsdf%25%5E%26
print(urllib.quote_plus(s)) #unquote_plus
#结果：http%3A%2F%2Fjk.chengdu.cn%2F%E4%B8%AD%E5%9B%BD+fsdf+fsdf%25%5E%26

#解码
print(urllib.unquote(urllib.quote(s)))
#结果：http://jk.chengdu.cn/中国 fsdf fsdf%^&

#print(urllib.urlencode(s)) #这个和php的urlencode()完全不一样
#urlencode用法
query = {'keyword':'中国','type':1}
print('http://jk.chengdu.cn?'+urllib.urlencode(query))
#结果：http://jk.chengdu.cn?type=1&keyword=%E4%B8%AD%E5%9B%BD

httplib模块获取HTTP响应头信息(python获取http状态码)

#coding:utf-8
import httplib
#链接
h = httplib.HTTPConnection('www.baidu.com')
#请求
h.request('GET','/')
#获取返回的HTTPResponse响应实例
'''
HTTPResponse对象有很多属性方法：read(),getheader(key)
gettheaders(),msg,version等等
'''
r = h.getresponse()
#获取http头信息
print(dict(r.msg))
#或者采用：
print(r.getheaders())

附HTTP常用状态码(来自《python参考手册》)

ps:以上所有提到的HTTP相关功能在PHP里可通过Curl工具来实现。