Python学习--23 内建模块及第三方库

来源：互联网发布：外交部段子知乎编辑：程序博客网时间：2024/05/17 08:25

本文将介绍python里常用的模块。如未特殊说明，所有示例均以python3.4为例：

$ python -VPython 3.4.3

网络请求

urllib

urllib提供了一系列用于操作URL的功能。通过urllib我们可以很方便的抓取网页内容。

抓取网页内容

# coding: utf-8import urllib.requesturl = 'https://api.douban.com/v2/book/2129650'with urllib.request.urlopen(url) as f:    headers = f.getheaders() # 报文头部    body = f.read() # 报文内容    print(f.status, f.reason) # 打印状态码、原因语句    for k,v in headers:        print(k + ': ' + v)    print(body.decode('utf-8'))

抓取百度搜索图片

import urllib.requestimport osimport reimport timeurl=r'http://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=result&fr=&sf=1&fmq=1488722322213_R&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E5%A3%81%E7%BA%B8%E5%B0%8F%E6%B8%85%E6%96%B0&f=3&oq=bizhi%E5%B0%8F%E6%B8%85%E6%96%B0&rsp=0'imgPath=r'E:\img'if not os.path.isdir(imgPath):    os.mkdir(imgPath)imgHtml=urllib.request.urlopen(url).read().decode('utf-8')#test html#print(imgHtml)urls=re.findall(r'"objURL":"(.*?)"',imgHtml)index=1for url in urls:    print("下载:",url)    #未能正确获得网页 就进行异常处理    try:        res=urllib.request.urlopen(url)        if str(res.status)!='200':            print('未下载成功：',url)            continue    except Exception as e:        print('未下载成功：',url)    filename=os.path.join(imgPath,str(time.time()) + '_' + str(index)+'.jpg')    with open(filename,'wb') as f:        f.write(res.read())        print('下载完成\n')        index+=1print("下载结束，一共下载了 %s 张图片"% (index-1))

python2.7的用户需要把urllib.request替换成urllib。

批量下载图片

# coding: utf-8import os,urllib.requesturl_path = 'http://www.ruanyifeng.com/images_pub/'imgPath=r'E:\img'if not os.path.isdir(imgPath):    os.mkdir(imgPath)index=1for i in range(1,355):    url = url_path + 'pub_' + str(i) + '.jpg'    print("下载:",url)    try:        res = urllib.request.urlopen(url)        if(str(res.status) != '200'):            print("下载失败:", url)            continue    except:        print('未下载成功：',url)    filename=os.path.join(imgPath,str(i)+'.jpg')    with open(filename,'wb') as f:        f.write(res.read())        print('下载完成\n')        index+=1print("下载结束，一共下载了 %s 张图片"% (index-1))

模拟GET请求附带头信息

urllib.request.Request实例化后有个add_header()方法可以添加头信息。

# coding: utf-8import urllib.requesturl = 'http://www.douban.com/'req = urllib.request.Request(url)req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')with urllib.request.urlopen(req) as f:    headers = f.getheaders()    body = f.read()    print(f.status, f.reason)    for k,v in headers:        print(k + ': ' + v)    print(body.decode('utf-8'))

这样会返回适合iPhone的移动版网页。

发送POST请求

urllib.request.urlopen()第二个参数可以传入需要post的数据。

#!/usr/bin/python# -*- coding: utf-8 -*-import jsonfrom urllib import requestfrom urllib.parse import urlencode#----------------------------------# 手机号码归属地调用示例代码 － 聚合数据# 在线接口文档：http://www.juhe.cn/docs/11#----------------------------------def main():    #配置您申请的APPKey    appkey = "*********************"    #1.手机归属地查询    request1(appkey,"GET")#手机归属地查询def request1(appkey, m="GET"):    url = "http://apis.juhe.cn/mobile/get"    params = {        "phone" : "", #需要查询的手机号码或手机号码前7位        "key" : appkey, #应用APPKEY(应用详细页查询)        "dtype" : "", #返回数据的格式,xml或json，默认json    }    params = urlencode(params).encode('utf-8')    if m =="GET":        f = request.urlopen("%s?%s" % (url, params))    else:        f = request.urlopen(url, params)    content = f.read()    res = json.loads(content.decode('utf-8'))    if res:        error_code = res["error_code"]        if error_code == 0:            #成功请求            print(res["result"])        else:            print("%s:%s" % (res["error_code"],res["reason"]) )    else:        print("request api error")if __name__ == '__main__':    main()

Requests

虽然Python的标准库中urllib2模块已经包含了平常我们使用的大多数功能，但是它的API使用起来让人实在感觉不好。它已经不适合现在的时代，不适合现代的互联网了。而Requests的诞生让我们有了更好的选择。

正像它的名称所说的，HTTP for Humans,给人类使用的HTTP库！在Python的世界中，一切都应该简单。Requests使用的是urllib3，拥有了它的所有特性，Requests 支持 HTTP 连接保持和连接池，支持使用 cookie 保持会话，支持文件上传，支持自动确定响应内容的编码，支持国际化的 URL 和 POST 数据自动编码。现代、国际化、人性化。

官网：http://python-requests.org/
文档：http://cn.python-requests.org/zh_CN/latest/
Github主页：https://github.com/kennethreitz/requests

需要先安装：

$ pip3 install requestsCollecting requests  Downloading requests-2.13.0-py2.py3-none-any.whl (584kB)    100% |████████████████████████████████| 593kB 455kB/sInstalling collected packages: requestsSuccessfully installed requests-2.13.0

请求示例

#!/usr/bin/python# -*- coding: utf-8 -*-import requestsurl = 'https://api.github.com/user'# r = requests.request('get', url, auth=('52fhy', ''))r = requests.get(url, auth=('', ''))print('Status: %s' % r.status_code) # 状态码# 头信息for k,v in r.headers.items():    print(k + ': ' + v)print('encoding: ' , r.encoding)print('body: ' , r.text)print('json body: ' , r.json())

默认情况下，dict迭代的是key。如果要迭代value，可以用for value in d.values()，如果要同时迭代key和value，可以用for k, v in d.items()。

POST请求

基于表单的：

# coding: utf-8import requestspayload = {'name': 'python', 'age': '11'}r = requests.post("http://httpbin.org/post", data=payload)print(r.text)

基于text的：

# coding: utf-8import requests,jsonpayload = {'name': 'python', 'age': '11'}r = requests.post("https://api.github.com/some/endpoint", data=json.dumps(payload))print(r.text)

还可以使用 json 参数直接传递，然后它就会被自动编码。这是 2.4.2 版的新加功能：

r = requests.post("https://api.github.com/some/endpoint", json=payload)

hashlib

md5

import hashlibmd5 = hashlib.md5()md5.update('how to use md5 in python hashlib?'.encode('utf-8'))print(md5.hexdigest())

结果如下：

d26a53750bc40b38b65a520292f69306

update()，用于将内容分块进行处理，适用于大文件的情况。示例：

import hashlibdef get_file_md5(f):    m = hashlib.md5()    while True:        data = f.read(10240)        if not data:            break        m.update(data)    return m.hexdigest()with open(YOUR_FILE, 'rb') as f:    file_md5 = get_file_md5(f)

对于普通字符串的md5，可以封装成函数：

def md5(string):    import hashlib    return hashlib.md5(string.encode('utf-8')).hexdigest()

SHA1

import hashlibsha1 = hashlib.sha1()sha1.update('py'.encode('utf-8'))sha1.update('thon'.encode('utf-8'))print(sha1.hexdigest())

等效于：

hashlib.sha1('python'.encode('utf-8')).hexdigest()

SHA1的结果是160 bit字节，通常用一个40位的16进制字符串表示。

此外，hashlib还支持sha224, sha256, sha384, sha512。

base64

Base64是一种用64个字符来表示任意二进制数据的方法。Python内置的base64可以直接进行base64的编解码：

>>> import base64>>> base64.b64encode(b'123')b'MTIz'>>> base64.b64decode(b'MTIz')b'123'

由于标准的Base64编码后可能出现字符+和/，在URL中就不能直接作为参数，所以又有一种"url safe"的base64编码，其实就是把字符+和/分别变成-和_：

>>> base64.b64encode(b'i\xb7\x1d\xfb\xef\xff')b'abcd++//'>>> base64.urlsafe_b64encode(b'i\xb7\x1d\xfb\xef\xff')b'abcd--__'>>> base64.urlsafe_b64decode('abcd--__')b'i\xb7\x1d\xfb\xef\xff'

时间日期

该部分在前面的笔记里已做详细介绍：http://www.cnblogs.com/52fhy/p/6372194.html。本节仅作简单回顾。

time

# coding:utf-8import time# 获取时间戳timestamp = time.time()print(timestamp)# 格式时间print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime()))# 返回当地时间下的时间元组tprint(time.localtime())# 将时间元组转换为时间戳print(time.mktime(time.localtime()))t = (2017, 2, 11, 15, 3, 38, 1, 48, 0)print(time.mktime(t))# 字符串转时间元组：注意时间字符串与格式化字符串位置一一对应print(time.strptime('2017 02 11', '%Y %m %d'))# 睡眠print('sleeping...')time.sleep(2) # 睡眠2sprint('sleeping end.')

输出：

1486797515.787422017-02-11 15:18:35time.struct_time(tm_year=2017, tm_mon=2, tm_mday=11, tm_hour=15, tm_min=18, tm_sec=35, tm_wday=5, tm_yday=42, tm_isdst=0)1486797515.01486796618.0time.struct_time(tm_year=2017, tm_mon=2, tm_mday=11, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=5, tm_yday=42, tm_isdst=-1)sleeping...sleeping end.

datetime

方法概览：

datetime.now() # 当前时间，datetime类型datetime.timestamp() # 时间戳，浮点类型datetime.strftime('%Y-%m-%d %H:%M:%S') # 格式化日期对象datetime，字符串类型datetime.strptime('2017-2-6 23:22:13', '%Y-%m-%d %H:%M:%S') # 字符串转日期对象datetime.fromtimestamp(ts) # 获取本地时间，datetime类型datetime.utcfromtimestamp(ts) # 获取UTC时间，datetime类型

示例：

# coding: utf-8from datetime import datetimeimport timenow = datetime.now()print(now)# datetime模块提供print(now.timestamp())

输出：

2017-02-06 23:26:54.6315821486394814.631582

小数位表示毫秒数。

图片处理

PIL

PIL(Python Imaging Library)已经是Python平台事实上的图像处理标准库了。PIL功能非常强大,但API却非常简单易用。

安装：

$ pip install PillowCollecting Pillow  Downloading Pillow-4.0.0-cp34-cp34m-win32.whl (1.2MB)Successfully installed Pillow-4.0.0

图像缩放：

# coding: utf-8from PIL import Imageim = Image.open('test.jpg')print(im.format, im.size, im.mode)im.thumbnail((200, 100))im.save('thumb.jpg', 'JPEG')

模糊效果：

# coding: utf-8from PIL import Image,ImageFilterim = Image.open('test.jpg')im2 = im.filter(ImageFilter.BLUR)im2.save('blur.jpg', 'jpeg')

验证码：

from PIL import Image, ImageDraw, ImageFont, ImageFilterimport random# 随机字母:def rndChar():    return chr(random.randint(65, 90))# 随机颜色1:def rndColor():    return (random.randint(64, 255), random.randint(64, 255), random.randint(64, 255))# 随机颜色2:def rndColor2():    return (random.randint(32, 127), random.randint(32, 127), random.randint(32, 127))# 240 x 60:width = 60 * 4height = 60image = Image.new('RGB', (width, height), (255, 255, 255))# 创建Font对象:font = ImageFont.truetype('Arial.ttf', 36)# 创建Draw对象:draw = ImageDraw.Draw(image)# 填充每个像素:for x in range(width):    for y in range(height):        draw.point((x, y), fill=rndColor())# 输出文字:for t in range(4):    draw.text((60 * t + 10, 10), rndChar(), font=font, fill=rndColor2())# 模糊:image = image.filter(ImageFilter.BLUR)image.save('code.jpg', 'jpeg')

注意示例里的字体文件必须是绝对路径。

参考：
1、Python资源
http://hao.jobbole.com/?catid=144

0 0