python的 requests库使用

来源：互联网发布：java字符串编码转换编辑：程序博客网时间：2024/04/19 17:23

转载请注明原始出处：http://blog.csdn.net/a464057216/article/details/52713945

简介

Python的HTTP包有urllib、urllib2、httplib等，但是都需要了解较多的HTTP原理才能编码，借助requests包可以在较高的抽象层次上完成HTTP交互过程的开发。安装requests使用pip install requests命令，requests包内嵌了urllib3，自动支持HTTP长连接、连接池等功能。

使用方法

requests支持HTTP的HEAD、GET、POST、PUT、OPTIONS、DELETE、PATCH等请求：

r = requests.options('http://localhost:5000/')print "Options:", r.headersr = requests.post('http://localhost:5000/', data={'name': 'mars'})print "Post:", r.contentr = requests.put('http://localhost:5000/', data={'name': 'loo'})print "Put:", r.contentr = requests.get('http://localhost:5000/')print "Get:", r.contentr = requests.delete('http://localhost:5000/')print "Delete:", r.contentr = requests.get('http://localhost:5000/')print "Get:", r.content1
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
7
8
9
10
11
12

变量r是一个requests.models.Response类型的响应对象，通过r可以得到HTTP响应的所有信息。

传递QUERY参数

在URI的query部分传递参数，可以直接按照标准放在URL字符串中（允许为同一个key赋值多个value）：

r = requests.get('http://localhost:5000/?xx=bb&xx=cc')1
1

也可以放在请求的params参数中：

params = {    'xx': ['bb', 'cc'],    'yy': None,}r = requests.get('http://localhost:5000/', params=params)print "Request URL:", r.url1
2
3
4
5
6
1
2
3
4
5
6

使用字典做参数时，对同一个key的多个value要放在列表中，如果某个key对应的值为None，则其不会放在query中。

定制请求头

为HTTP请求报文的头部添加内容，可以请求时为headers参数赋值：

r = requests.get('http://localhost:5000/', headers={"mars": "loo"})1
1

填写cookie

cookie可以以字典的形式赋值给cookies参数：

cookies = {    "name": "mars"}r = requests.post('http://localhost:5000/', cookies=cookies)1
2
3
4
5
1
2
3
4
5

通过RequestsCookieJar对象可以设置cookie的域、path等信息：

jar = requests.cookies.RequestsCookieJar()jar.set('username', 'mars', domain='httpbin.org', path='/cookies')jar.set('password', 'loo', domain='httpbin.org', path='/else')r = requests.get('http://httpbin.org/cookies', cookies=jar)print r.text1
2
3
4
5
1
2
3
4
5

如果是在localhost做实验，domain参数需要赋值为空字符串''。

http://httpbin.org/cookies提供的服务是：如果请求包含cookie的话，会在响应体中回应cookie内容，所以上述代码返回：

{  "cookies": {    "username": "mars"  }}1
2
3
4
5
1
2
3
4
5

因为password在/else这个path，所以通过/cookies无法访问key为password的cookie项。

填充请求体

如果采用application/x-www-form-urlencoded格式发送HTTP请求，可以将请求内容放在data参数中。如果采用application/json格式请求，可以将内容（dict类型）放在json参数中，或者将字典转化为JSON字符串之后传给data参数，同时指定content-type为application/json：

import jsond = {    "mars": "loo"}r = requests.post('http://localhost:5000/', data = json.dumps(d),                  headers={"content-type": "application/json"})1
2
3
4
5
6
7
1
2
3
4
5
6
7

如果需要上传文件，直接将文件以'rb'模式打开放入字典（必须使用'rb'模式，requests才能自动推算出正确的content-length），然后传入files参数，请求类型会自动转换为multipart/form-data：

files = {    'image': open('sample.jpg', 'rb'),}r = requests.post('http://localhost:5000/', files=files)1
2
3
4
5
1
2
3
4
5

需要上传的文件大小超过内存时，可以将文件的读取放在上下文管理器中，比如：

with open('verybig.zip', 'rb') as f:    requests.post('http://localhost:5000/', data=f)1
2
1
2

处理响应对象

从为QUERY传递参数的例子中可以看到，使用响应对象的url属性可以访问请求的URL。status_code属性可以获取响应状态码。raise_for_status方法，当状态码为4XX或5XX时，抛出对应的客户端或服务端异常，如果是2XX或3XX错误，返回None。比如：

r = requests.get('http://localhost:5000/')print r.raise_for_status()1
2
1
2

输出可能是None，或者是：

Traceback (most recent call last):  File "a.py", line 38, in <module>    print r.raise_for_status()  File "/Users/linglingguo/Envs/learnselenium/lib/python2.7/site-packages/requests/models.py", line 862, in raise_for_status    raise HTTPError(http_error_msg, response=self)requests.exceptions.HTTPError: 403 Client Error: FORBIDDEN for url: http://localhost:5000/1
2
3
4
5
6
1
2
3
4
5
6

encoding属性获取响应编码，text属性会尝试按照encoding属性自动将响应内容转码后返回，如果encoding属性为None，requests会根据chardet猜测正确的编码。针对响应内容是二进制文件（如图片）的场景，content属性获取响应的原始内容（以字节为单位），比如：

from PIL import Imagefrom io import BytesIOr = requests.post('http://localhost:5000/picture')i = Image.open(BytesIO(r.content))i.save('sample.jpg', 'jpeg')1
2
3
4
5
6
1
2
3
4
5
6

如果响应内容的大小超过了机器内存，需要分段读取响应内容，可以在请求时使用stream=True然后调用响应对象的iter_content方法：

r = requests.post('http://localhost:5000/', stream=True)with open('download.zip', 'wb') as f:    for chunk in r.iter_content(chunk_size=10*1024*1024):        if chunk:            f.write(chunk)1
2
3
4
5
1
2
3
4
5

针对application/json格式的响应内容，requests内置了json方法将结果转换为字典后返回：

r = requests.post('http://localhost:5000/', data={'name': 'mars'})print "JSON:", r.json()1
2
1
2

如果响应内容不能转换为字典，抛出异常：ValueError: No JSON object could be decoded。

通过headers属性可以访问响应的头部。

重定向与访问历史

如果请求过程发生了重定向，requests默认返回最后一个成功的响应，如果要获取中间重定向过程的响应，可以访问history属性(按照访问先后顺序的响应对象列表)，比如：

r = requests.get("http://localhost:5000")print r.historyfor re in r.history:    print re.status_code, re.headers1
2
3
4
1
2
3
4

上述代码输出为：

[<Response [302]>, <Response [301]>]302 {'Date': 'Wed, 28 Sep 2016 07:58:28 GMT', 'Content-Length': '211', 'Content-Type': 'text/html; charset=utf-8', 'Location': 'http://localhost:5000/a', 'Server': 'Werkzeug/0.11.10 Python/2.7.10'}301 {'Date': 'Wed, 28 Sep 2016 07:58:28 GMT', 'Content-Length': '215', 'Content-Type': 'text/html; charset=utf-8', 'Location': 'http://localhost:5000/404', 'Server': 'Werkzeug/0.11.10 Python/2.7.10'}1
2
3
1
2
3

可以发现请求先被重定向到http://localhost:5000/a，最后被重定向到http://localhost:5000/404拿到了结果。如果想禁用requests默认的处理转发的行为，可以使用allow_redirect=False，比如：

r = requests.get("http://localhost:5000", allow_redirects=False)print r.status_code, r.headersfor re in r.history:    print re.status_code, re.headers1
2
3
4
1
2
3
4

上述代码输出为：

302 {'Date': 'Wed, 28 Sep 2016 08:04:04 GMT', 'Content-Length': '211', 'Content-Type': 'text/html; charset=utf-8', 'Location': 'http://localhost:5000/a', 'Server': 'Werkzeug/0.11.10 Python/2.7.10'}1
1

超时

requests默认不设置超时，一直等待服务端响应。requests中将超时分为两个部分：连接超时和响应读取超时，分别表示Socket建立TCP链接超时和TCP链接建立以后，客户端读取服务端响应超时。如果设置timeout参数为一个数值，则连接超时和响应读取超时设置为同样的值，如果timeout参数是一个包含两个数值的元组，则分别代表连接超时和响应读取超时。如果过了超时时间设置以后，未成功建立TCP链接或者未成功读取服务端响应，会抛异常，比如响应读取超时的异常如下（以设置3秒超时为例）：

requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=5000): Read timed out. (read timeout=3)1
1

Session对象

这里的Session是指一系列有相关意义的HTTP请求/响应的集合。使用requests.Session对象可以在多个HTTP请求之间保持变量、共用cookie、保持长连接从而提高性能（由urllib3实现）等。在请求间自动保持cookie的例子：

s = requests.Session()r = s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')print "1 - Headers:", r.headersprint "1 - Content:", r.contentr = s.get('http://httpbin.org/cookies')print "2 - Text:", r.textprint "2 - Headers:", r.request.headers1
2
3
4
5
6
7
1
2
3
4
5
6
7

上述代码输出为：

1 - Headers: {'Content-Length': '56', 'Server': 'nginx', 'Connection': 'keep-alive', 'Access-Control-Allow-Credentials': 'true', 'Date': 'Fri, 30 Sep 2016 07:35:18 GMT', 'Access-Control-Allow-Origin': '*', 'Content-Type': 'application/json'}1 - Content: {  "cookies": {    "sessioncookie": "123456789"  }}2 - Text: {  "cookies": {    "sessioncookie": "123456789"  }}2 - Headers: {'Connection': 'keep-alive', 'Cookie': 'sessioncookie=123456789', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.11.1'}1
2
3
4
5
6
7
8
9
10
11
12
13
14
1
2
3
4
5
6
7
8
9
10
11
12
13
14

通过为Session对象赋值可以在请求间提供默认值，HTTP VERB方法调用中，对于新增的值会追加，已有的值会覆盖：

s = requests.Session()s.headers.update({'1': '2'})r = s.get('http://localhost:5000/headers', headers={'3': '4', '1': 'mars'})1
2
3
1
2
3

服务器收到的header中'3'->'4'，'1'->'mars'。但是对于这种在调用时新增的方法，不会在请求间保持，比如：

s = requests.Session()r = s.get('http://httpbin.org/cookies', cookies={'mars': 'loo'})print(r.text)r = s.get('http://httpbin.org/cookies')print(r.text)1
2
3
4
5
6
7
1
2
3
4
5
6
7

代码输出为：

{  "cookies": {    "mars": "loo"  }}{  "cookies": {}}1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9

可以把Session对象放在上下文管理器中，这样发生异常时可以自动销毁会话，从而释放连接池中的连接，提高程序性能：

with requests.Session() as s:    r = s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')1
2
1
2

根据响应获取请求

使用响应对象的request属性可以访问响应对应的请求对象，比如：

r = requests.get("http://localhost:5000", headers={"Perm": "Authroized"})print "Request headers:", r.request.headers1
2
1
2

上述代码输出可能为：

Request headers: {'Perm': 'Authroized', 'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.11.1'}1
1

其实通过响应对象的request属性得到的是一个PreparedRequest对象，可以修改其某些属性后通过Session对象的send方法重发修改后的请求：

from requests import Sessionr = requests.get('http://www.baidu.com')print r.status_code, r.contents = Session()r2 = s.send(r.request)print r2.status_code, r2.content1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8

也可以先构造Request对象，然后调用其prepare方法返回一个PreparedRequest对象，再将该对象传给Session对象的send方法：

s = requests.Session()s.cookies.update({'mars': 'loo'})req = requests.Request('GET', 'http://httpbin.org/cookies')prepared = req.prepare()r = s.send(prepared)print r.text1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8

上述代码的输出为：

{  "cookies": {}}1
2
3
1
2
3

说明通过Request对象的prepare方法生成的PreparedRequest对象不会读取Session对象层次上的默认值（如cookie的设置）。Session对象的prepare_request方法也可以接受一个Request对象作为参数，返回PreparedRequest对象，但是会读取Session对象层次上的默认值（如cookie的设置）：

s = requests.Session()s.cookies.update({'mars': 'loo'})req = requests.Request('GET', 'http://httpbin.org/cookies')prepared = s.prepare_request(req)r = s.send(prepared)print r.text1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8

上述代码的输出为：

{  "cookies": {    "mars": "loo"  }}1
2
3
4
5
1
2
3
4
5

SSL认证

requests默认内置了Mozilla公布的受信CA，requests默认会对服务器端SSL证书进行认证，如果证书非法会报如下错误：

requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)1
1

如果要禁用SSL的验证，可以请求时设置verify参数为False：

r = requests.get('https://xxx.com', verify=False)1
1

依赖requests版本默认的CA的话，只有更新requests包的时候才会更新受信CA。新版本的requests会尝试使用certifi(如果使用了certifi的话，建议经常升级certifi)。

如果服务端的SSL/TLS版本与requests默认的不一致，可以借助HTTPAdapter对象更改Session对象使用的协议版本：

import requestsfrom requests.adapters import HTTPAdapterfrom requests.packages.urllib3.poolmanager import PoolManagerimport sslclass MyAdapter(HTTPAdapter):    def init_poolmanager(self, connections, maxsize, block=False):        self.poolmanager = PoolManager(num_pools=connections,                                      maxsize=maxsize,                                      block=block,                                      ssl_version=ssl.PROTOCOL_TLSv1)s = requests.Session()# 挂载Adapter（https请求底层采用TLSv1协议）s.mount('https://', MyAdapter())s.get('https://xxx.com')1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

HTTP认证

关于HTTP认证可以阅读我的上一篇博客。

基本认证

import requestsfrom requests.auth import HTTPBasicAuthr = requests.get('http://localhost:5000/', auth=HTTPBasicAuth('mars', 'loo'))print r.status_code, r.content1
2
3
4
5
1
2
3
4
5

摘要认证

from requests.auth import HTTPDigestAuthr = requests.get('http://localhost:5000/', auth=HTTPDigestAuth('mars', 'loo'))print "1 - Response Header:", r.headersprint "1 - Request Header:", r.request.headers1
2
3
4
5
1
2
3
4
5

上述代码的输出样例为：

1 - Response Header: {'Date': 'Thu, 29 Sep 2016 15:05:08 GMT', 'Content-Length': '12', 'Content-Type': 'text/html; charset=utf-8', 'Server': 'Werkzeug/0.11.10 Python/2.7.10'}1 - Request Header: {'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.11.1', 'Connection': 'keep-alive', 'Cookie': 'session=.eJyrVkosLcmIz8vPS05VsqpWUkhSslKKrHI18HdJr_B396z0c_fK8atyNAXSWb7hrgZRWckVfu6BQDm37MiQUFulWh2IEfkFiYWlSGaEeBr7hkeaRlbl5PplBWX7GXka-Rn5Gvrm-gLNCMr0dckGyrsaRob4As2oBQDP-yuE.Cs6_JA.kHRL3DmOS5wJavbd34C8cY7Fvl8', 'Authorization': 'Digest username="mars", realm="Authentication Required", nonce="c148818b24be7094bc1a4f714d18ada5", uri="/", response="ed0fd2e1bf8bbe2c3bccc52c60b63a11", opaque="a271f9c9f64d7b67c52c4f4b0971a5a3"'}1
2
1
2

代理场景

HTTP/HTTPS代理

请求时可以通过proxies参数设置代理，代理中也可以设置认证方式，比如：

proxies = {    "https": "https://username:password@proxy.com",    "http://stackoverflow.com": "http://username:password@proxy.com"}r = requests.get('https://www.google.com', proxies=proxies)print r.status_coder = requests.get('http://stackoverflow.com', proxies=proxies)print r.status_code1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9

SOCKS代理

SOCKS代理从传输层转发代理报文，并不关注应用层采用的是什么协议，速度一般比HTTP/HTTPS代理速度快。如果代理使用了SOCKS协议，则需要安装相应的包支持：pip install requests[socks]，然后在代理的URL中按照SOCKS协议书写代理的URL即可。

阅读全文

0 0