UrlLib库的相关用法

来源:互联网 发布:专业音频测试软件 编辑:程序博客网 时间:2024/06/09 17:39

什么是UrlLib

它是python内置的HTTP请求库,不需要额外的安装
它其中包含的几个模块的说明:
1:请求模块,urllib.request
2:异常处理模块,urllib.error
3:url解析模块,urllib.parse
4:robots.txt解析模块 urllib.robotparser

urllib库在python2和Python3中的区别

Python2
import urllib2
response=urllib2.urlopen(“http://www.baidu.com“)
python3
import urllib.request
response=urllib.request.urlopen(“http://www.baidu.com“)

用法详解

import urllib.requestimport socketimport urllib.errorimport urllib.parsefrom fake_useragent import UserAgent#get方式不带参数请求网页response=urllib.request.urlopen("http://www.baidu.com")print(response.read().decode("utf-8"))#带参数POST 请求data=bytes(urllib.parse.urlencode({"word":"hello"}),encoding="utf8")#httpbin.org是一个HTTP测试的网站response=urllib.request.urlopen("http://httpbin.org/post",data=data)print(response.read())#超时设置response=urllib.request.urlopen("http://httpbin.org/get",timeout=1)print(response.read())try:    response=urllib.request.urlopen("http://httpbin.org/get",timeout=0.1)except urllib.error.URLError as e:    if isinstance(e.reason,socket.timeout):        print("TIME OUT!")#响应类型response=urllib.request.urlopen("http://www.baidu.com")print(type(response))#状态码、响应头print(response.status)print(response.getheaders())print(response.getheader("Server"))url="http://httpbin.org/post"ua = UserAgent()headers={    "User-Agent":ua.random,    "Host":"httpbin.org"}dic={    "name":"Germey"}data=bytes(urllib.parse.urlencode(dic),encoding="utf8")req=urllib.request.Request(url=url,data=data,headers=headers,method="POST")response=urllib.request.urlopen(req)print(response.read().decode("utf-8"))

设置代理

import urllib.request#设置代理proxy_handler=urllib.request.ProxyHandler(    {        'http':"http://121.232.146.98:9000",    })opener=urllib.request.build_opener(proxy_handler)response=opener.open("http://www.baidu.com")print(response.read())

处理cookies

获取cookies

import urllib.requestimport http.cookiejarcookie=http.cookiejar.CookieJar()handler=urllib.request.HTTPCookieProcessor(cookie)opener=urllib.request.build_opener(handler)response=opener.open("http://www.baidu.com")for item in cookie:    print(item.name+"="+item.value)

以Mozilla格式存储cookie

import urllib.requestimport http.cookiejarfilename="cookies.txt"cookie=http.cookiejar.MozillaCookieJar(filename)handler=urllib.request.HTTPCookieProcessor(cookie)opener=urllib.request.build_opener(handler)response=opener.open("http://www.baidu.com")cookie.save(ignore_discard=True,ignore_expires=True)

以LWP格式存储cookie

import urllib.requestimport http.cookiejarfilename="cookies.txt"cookie=http.cookiejar.LWPCookieJar(filename)handler=urllib.request.HTTPCookieProcessor(cookie)opener=urllib.request.build_opener(handler)response=opener.open("http://www.baidu.com")cookie.save(ignore_discard=True,ignore_expires=True)

读取cookies

import urllib.requestimport http.cookiejarfilename="cookies.txt"cookie=http.cookiejar.LWPCookieJar()cookie.load(filename,ignore_discard=True,ignore_expires=True)handler=urllib.request.HTTPCookieProcessor(cookie)opener=urllib.request.build_opener(handler)response=opener.open("http://www.baidu.com")print(response.read().decode("utf-8"))

异常处理

通用示例

import urllib.requestimport urllib.errortry:    response=urllib.request.urlopen("http://www.baidulcvb.com")except urllib.error.URLError as e:    print(e.reason)

捕获HTTPError

import urllib.requestimport urllib.errortry:    response=urllib.request.urlopen("http://www.cuiqingcai.com/index.html")except urllib.error.HTTPError as e:    print(e.code,e.reason,e.headers,sep="\n")except urllib.error.URLError as e:    print(e.reason)else:    print("Request Successfully!")

异常类型判断

import urllib.requestimport urllib.errorimport sockettry:    response=urllib.request.urlopen("http://www.baidu.com",timeout=0.001)except urllib.error.URLError as e:    print(type(e.reason))    if isinstance(e.reason,socket.timeout):        print("TIME OUT")

URL解析的相关方法

urlparse
url解析库,使用示例如下:

from urllib.parse import urlparseresult=urlparse("http://www.baidu.com/index.html;user?id=5#comment")print(type(result),result)#没有协议类型,可指定协议类型result=urlparse("www.baidu.com/index.html;user?id=5#comment",scheme="https")print(result)#如果url中有协议类型,指定协议类型失效result=urlparse("http://www.baidu.com/index.html;user?id=5#comment",scheme="https")print(result)

urlunparse
url拼接库,使用示例如下:

from urllib.parse import urlunparsedata=['http','www.baidu.com','index.html','user','a=6','comment']print(urlunparse(data))

urlencode

可以把一个字典转化成get请求参数,示例代码如下:

from urllib.parse import urlencodeparams={    "name":"germey",    "age":22}base_url="http://www.baidu.com?"url=base_url+urlencode(params)print(url)