003 Urllib库的使用
来源:互联网 发布:淘宝模特招聘网 编辑:程序博客网 时间:2024/06/14 09:16
二、Urllib库的基本使用
1.Urllib
是一个Python内置的HTTP请求库
urllib.request 请求模块urllib.error 异常处理模块urllib.parse url解析模块urllib.robotparser robots.txt解析模块
2.与Python2的变化
Python2
import urllib2response = urllib2.urlopen('http://www.baidu.com')
Python3
import urllib.requestresponse = urllib.request.urlopen('http://www.baidu.com')
3.urllib用法详解
1.urlopen
urllib.request.urlopen(url,data=None,[timeout,]*,cafile=None,capath=None,cadefault=False,context=None)
import urllib.requestresponse = urllib.request.urlopen('http://www.baidu.com')print(response.read().decode('utf-8'))
import urllib.parseimport urllib.requestdata = bytes(urllib.parse.urlencode({'word':'hello'}),encoding='utf-8')response = urllib.request.urlopen('http://httpbin.org/post',data=data)print(response.read())
import urllib.requestresponse = urllib.request.urlopen('http://httpbin.org/get',timeout=1)print(response.read())
import socketimport urllib.requestimport urllib.errortry: response = urllib.request.urlopen('http://httpbin.org/get',timeout=0.1)except urllib.error.URLError as e: if isinstance(e.reason,socket.timeout): print('TIME OUT')
2.Response(响应)
1.响应类型
import urllib.requestresponse = urllib.request.urlopen('https://www.python.org')print(type(response))
2.状态码、响应头
import urllib.requestresponse = urllib.reuqest.urlopen('https://www.python.org')print(response.status)print(response.getheaders())print(response.getheader('Server'))
3.响应体
import urllib.requestresponse = urllib.request.urlopen('https://www.python.org')print(response.read().decode('utf-8'))
3.Request
import urllib.requestrequest = urllib.request.Request('https://python.org')response = urllib.request.urlopen(request)print(response.read().decode('utf-8'))
from urllib import request,parseurl = 'http://httpbin.org/post'headers = { 'User-Agent':'Mozilla/4.0(compatible;MSIE5.5;Windows NT)', 'Host':'httpbin.org'}dict = { 'name':'Germey'}data = bytes(parse.urlencode(dict),encoding='utf8')req = request.Request(url=url,data=data,headers=headers,method='POST')response = request.urlopen(req)print(response.read().decode('utf-8'))
from urllib import request,parseurl = 'http://httpbin.org/post'dict = { 'name':'Germey'}data = bytes(parse.urlencode(dict),encoding='utf8')req = request.Request(url=url,data=data,method='POST')req.add_header('User-Agent','Mozilla/4.0(compatible;MSIE 5.5;Windows NT)')response = request.urlopen(req)print(response.read().decode('utf-8'))
4.Handler
1.代理
import urllib.requestproxy_handler = urllib.request.ProxyHandler({ 'http':'http://127.0.0.1:9743', 'https':'https://127.0.0.1:9743',})opener = urllib.request.build_opener(proxy_handler)response = opener.open('http://httpbin.org/get')print(response.read())
2.Cookie
获取Cookie
import http.cookiejar,urllib.requestcookie = http.cookiejar.CookieJar()handler = urllib.request.HTTPCookieProcessor(cookie)opener = urllib.request.build_opener(handler)response = opener.open('http://www.baidu.com')for item in cookie: print(item.name+"="+item.value)
保存Cookie两种方式
import http.cookiejar,urllib.requestfilename = 'cookie.txt'cookie = http.cookiejar.MozillaCookieJar(filename)handler = urllib.request.HTTPCookieProcessor(cookie)opener = urllib.request.build_opener(handler)response = opener.open('http://www.baidu.com')cookie.save(ignore_discard=True,ignore_expires=True)
import http.cookiejar,urllib.requestfilename = 'cookie.txt'cookie = http.cookiejar.LWPCookieJar(filename)handler = urllib.request.HTTPCookieProcessor(cookie)opener = urllib.request.build_opener(handler)response = opener.open('http://www.baidu.com')cookie.save(ignore_discard=True,ignore_expires=True)
读取Cookie
import http.cookiejar,urllib.requestcookie = http.cookiejar.LWPCookieJar()cookie.load('cookie.txt',ignore_discard=True,ignore_expires=True)handler = urllib.request.HTTPCookieProcessor(cookie)opener = urllib.request.build_opener(handler)response = opener.open('http://www.baidu.com')print(response.read().decode('utf-8'))
5.异常处理
from urllib import request,errortry: response = request.urlopen('http://cuiqingcai.com/index.html')except error.URLError as e: print(e.reason)
from urllib import request,errortry: response = request.urlopen('http://cuiqingcai.com/index.html')except error.HTTPError as e: print(e.reason,e.code,e.headers,sep='\n')except error.URLError as e: print(e.reason)else: print('Request Successfully')
import socketimport urllib.requestimport urllib.errortry: response = urllib.request.urlopen('https://www.baidu.com',timeout=0.01)except urllib.error.URLError as e: print(type(e.reason)) if isinstance(e.reason,socket.timeout): print('TIME OUT')
6.URL解析(工具模块,可以直接使用)
1.urlparse
urllib.parse.urlparse(urlstring,scheme='',allow_fragments=True)
from urllib.parse import urlparseresult = urlparse('http://wwww.baidu.com/index.html;user?id=5#comment')print(type(result),result)
from urllib.parse import urlparseresult = urlparse('www.baidu.com/index.html;user?id=5#comment',scheme='https')print(result)
from urllib.parse import urlparseresult = urlparse('http://www.baidu.com/index_html;user?id=5#comment',scheme='https')print(result)
from urllib.parse import urlparseresult = urlparse('http://www.baidu.com/index.html;user?id=5#comment',allow_fragments=False)print(result)
from urllib.parse import urlparseresult = urlparse('http://www.baidu.com/index.html#comment',allow_fragments=False)print(result)
2.urlunparse
urlparse的反函数 作用是将URL进行拼接,结果是得到一个完整的URL
from urllib.parse import urlunparsedata = ['http','www.baidu.com','index.html','user','a=6','comment']print(urlunparse(data))
3.urljoin
urljoin是用来拼接url的
url都可以分成六个字段,后面的字段名会覆盖前面的字段名
如果后面的字段在前面的url中不存在,就会用前面的字段来补充;如果后面的字段是存在的那么就全部以后面的为基准
from urllib.parse import urljoin
4.urlencode
可以把一个字典对象转化成一个GET请求参数
from urllib.parse import urlencodeparams = { 'name':'germey', 'age':22,}base_url = 'http://www.baidu.com?'url = base_url + urlencode(params)print(url)
阅读全文
0 0
- 003 Urllib库的使用
- urllib库的使用
- urllib库的使用
- Urllib库的基本使用
- 爬虫--学习系列--Urllib库的使用
- 使用python 3的urllib.request库
- Python3之urllib库的使用总结
- python3中urllib库的使用
- (二)urllib库的基本使用
- python之urllib库的基本使用
- urllib/urllib2的使用
- urllib模块的使用
- urllib模块的使用
- urllib的使用1
- python urllib库使用
- python [3.2] urllib的使用
- python [3.2] urllib的使用
- Python3 中urllib的使用
- 【Codeforces】547C Mike and Foam 容斥
- 扩展欧几里得算法及其应用——学习(复习)笔记
- 数据库系统概述
- 蓝松短视频SDK 您需要知道的事情
- BZOJ 3223: Tyvj 1729 文艺平衡树
- 003 Urllib库的使用
- Java8函数式编程之一: 行为参数化
- 20171101(查找sdddrtkjsfkkkasjdddj字符串中,出现次数最多的字符和次数。)
- linux中的定时任务及延时任务
- 004 request库的使用
- Mybatis传多个参数的三种解决方式
- 开发板刷系统(X210V3S)
- android 仿微信图片选择器
- RabbitMQ消息队列(六):使用主题进行消息分发