requests应用step1

来源:互联网 发布:java 银行外包公司 编辑:程序博客网 时间:2024/06/06 20:04

  • 爬取说明
  • 使用模块主要作用说明
  • 代码解释
  • 完整代码

爬取说明

爬取的是小黄鸭的图片并保存到本地

使用模块主要作用说明

import requestsfrom urllib.request import urlretrieveimport reimport os

urlretrieve:保存下载的图片
os:判断文件目录是否存在和文件目录的创建
re:正则模块,查找需要的内容

代码解释

设置了请求头:

url="http://www.ivsky.com/tupian/xiaohuangren_t21343/"headers = {        'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50',        'Referer': url,        'Connection': 'Keep-alive'    }

提交请求:

s=requests.get(url,headers=headers)# print(s.url)s=s.text# print(s)

使用re匹配需要的数据:

pattern = r'<div class="il_img".*?<img src="(.*?.jpg)" width'pa=re.compile(pattern)uls=re.findall(pattern=pa,string=s)

使用urlretrieve保存图片:

for item in uls:    # print(item)    #http://img.ivsky.com/img/tupian/t/201411/01/xiaohuangren-004.jpg    path = re.split("\/[0-9]{2}(\/.*?\.jpg)",item,2)[1]    path = '/root/python/python/taobao%s'%path    # print(os.path.exists(os.path.split(path)[0]))    if not (os.path.exists(os.path.split(path)[0])):        os.mkdir(os.path.split(path)[0])    print(path)    urlretrieve(item,path)

使用文件流保存图片:

for item in uls:    path = re.split("\/[0-9]{2}(\/.*?\.jpg)", item, 2)[1]    path = '/root/python/python/taobao%s' % path    imgedata=requests.get(item).content    print(path)    with open(path,"wb") as f:        f.write(imgedata)

总结:两种保存方式,文件流比urlretrieve快

完整代码

#coding:utf-8import requestsfrom urllib.request import urlretrieveimport reimport osurl="http://www.ivsky.com/tupian/xiaohuangren_t21343/"headers = {        'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50',        'Referer': url,        'Connection': 'Keep-alive'    }s=requests.get(url,headers=headers)# print(s.url)s=s.text# print(s)pattern = r'<div class="il_img".*?<img src="(.*?.jpg)" width'pa=re.compile(pattern)uls=re.findall(pattern=pa,string=s)'''urlretrievefor item in uls:    # print(item)    #http://img.ivsky.com/img/tupian/t/201411/01/xiaohuangren-004.jpg    path = re.split("\/[0-9]{2}(\/.*?\.jpg)",item,2)[1]    path = '/root/python/python/taobao%s'%path    # print(os.path.exists(os.path.split(path)[0]))    if not (os.path.exists(os.path.split(path)[0])):        os.mkdir(os.path.split(path)[0])    print(path)    urlretrieve(item,path)# print(len(uls))'''for item in uls:    path = re.split("\/[0-9]{2}(\/.*?\.jpg)", item, 2)[1]    path = '/root/python/python/taobao%s' % path    imgedata=requests.get(item).content    print(path)    with open(path,"wb") as f:        f.write(imgedata)
0 0
原创粉丝点击