requests应用step1
来源:互联网 发布:java 银行外包公司 编辑:程序博客网 时间:2024/06/06 20:04
- 爬取说明
- 使用模块主要作用说明
- 代码解释
- 完整代码
爬取说明
爬取的是小黄鸭的图片并保存到本地
使用模块主要作用说明
import requestsfrom urllib.request import urlretrieveimport reimport os
urlretrieve:保存下载的图片
os:判断文件目录是否存在和文件目录的创建
re:正则模块,查找需要的内容
代码解释
设置了请求头:
url="http://www.ivsky.com/tupian/xiaohuangren_t21343/"headers = { 'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50', 'Referer': url, 'Connection': 'Keep-alive' }
提交请求:
s=requests.get(url,headers=headers)# print(s.url)s=s.text# print(s)
使用re匹配需要的数据:
pattern = r'<div class="il_img".*?<img src="(.*?.jpg)" width'pa=re.compile(pattern)uls=re.findall(pattern=pa,string=s)
使用urlretrieve保存图片:
for item in uls: # print(item) #http://img.ivsky.com/img/tupian/t/201411/01/xiaohuangren-004.jpg path = re.split("\/[0-9]{2}(\/.*?\.jpg)",item,2)[1] path = '/root/python/python/taobao%s'%path # print(os.path.exists(os.path.split(path)[0])) if not (os.path.exists(os.path.split(path)[0])): os.mkdir(os.path.split(path)[0]) print(path) urlretrieve(item,path)
使用文件流保存图片:
for item in uls: path = re.split("\/[0-9]{2}(\/.*?\.jpg)", item, 2)[1] path = '/root/python/python/taobao%s' % path imgedata=requests.get(item).content print(path) with open(path,"wb") as f: f.write(imgedata)
总结:两种保存方式,文件流比urlretrieve快
完整代码
#coding:utf-8import requestsfrom urllib.request import urlretrieveimport reimport osurl="http://www.ivsky.com/tupian/xiaohuangren_t21343/"headers = { 'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50', 'Referer': url, 'Connection': 'Keep-alive' }s=requests.get(url,headers=headers)# print(s.url)s=s.text# print(s)pattern = r'<div class="il_img".*?<img src="(.*?.jpg)" width'pa=re.compile(pattern)uls=re.findall(pattern=pa,string=s)'''urlretrievefor item in uls: # print(item) #http://img.ivsky.com/img/tupian/t/201411/01/xiaohuangren-004.jpg path = re.split("\/[0-9]{2}(\/.*?\.jpg)",item,2)[1] path = '/root/python/python/taobao%s'%path # print(os.path.exists(os.path.split(path)[0])) if not (os.path.exists(os.path.split(path)[0])): os.mkdir(os.path.split(path)[0]) print(path) urlretrieve(item,path)# print(len(uls))'''for item in uls: path = re.split("\/[0-9]{2}(\/.*?\.jpg)", item, 2)[1] path = '/root/python/python/taobao%s' % path imgedata=requests.get(item).content print(path) with open(path,"wb") as f: f.write(imgedata)
0 0
- requests应用step1
- step1
- step1
- step1
- 微型嵌入式GUI应用开发-Step1
- python3 xpath和requests应用
- requests
- requests
- Requests
- requests
- mysql step1
- QT step1
- first+step1
- hdu step1
- hdu step1
- step1-泛型
- kafka-step1
- requests模块的安装与应用
- Fmod studio 获取spectrum波谱数据
- ffmpeg源码分析--8.avformat_find_stream_info及一些参数的确定
- SIT测试 和 UAT测试
- fastdfs分布式文件系统之TrackerServer连接池实现
- ffmpeg源码分析--9.av_read_frame
- requests应用step1
- jquery事件切换
- ffmpeg源码分析--10.视频帧IPB
- TextBox AutoComplete with ASP.NET and jQuery UI
- 2016 9月版本的linphone for android make 不能执行py,pl,sh等脚本问题
- 汇编预备知识(二)
- ffmpeg源码分析--11.mov的mov_read_header
- bootstrap 基本例子,导航栏和container的使用
- 各种主流 SQLServer 迁移到 MySQL 工具对比