nodejs和python 简单爬取百度图片保存在本地

来源：互联网发布：神话特效软件编辑：程序博客网时间：2024/05/20 00:51

近期想搞爬虫，就也顺便学了点python，
百度图片搜索api（通过控制台拿到）

http://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E5%A4%B4%E5%83%8F&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=&z=&ic=&word=%E5%A4%B4%E5%83%8F&s=&se=&tab=&width=&height=&face=&istype=&qc=&nc=&fr=&cg=head&pn=60&rn=30&gsm=3c&1505874585547=

主要的库

python

1.request2.re

node

http
fs
url

关键代码

python

url='http://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E5%A4%B4%E5%83%8F&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=&z=&ic=&word=%E5%A4%B4%E5%83%8F&s=&se=&tab=&width=&height=&face=&istype=&qc=&nc=&fr=&cg=head&pn=60&rn=30&gsm=3c&1505874585547='headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11','Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8','Accept-Charset':'ISO-8859-1,utf-8;q=0.7,*;q=0.3','Accept-Encoding':'gzip','Connection':'close', "referer":"https://image.baidu.com"}r= requests.get(url=url,headers=headers,timeout=3)

据说要写referer，不过应该是访问具体图片的时候需要加上，不然就403了，这里懒得改咯
请求的接口用正则匹配(可以用json)出接口

 try:        p=requests.get(k,headers=headers, stream=True,timeout=3)        pass    except:        # print('error',k)        continue    if(p.status_code==200):        print(_img_path,k)        with open('./image/'+str(_img_path)+'.png','wb') as fs:            for chunk in p.iter_content():                fs.write(chunk)            fs.close()            _img_path=_img_path+1

需要注意。爬取图片需要转成流的形式，才能写入文件，话说 python爬虫就是简单不过本身这种小爬虫也简单

node
因为node比较熟悉，就顺手写完python就写了node ，然后顺便支持了一下用户输入，
关键代码差不多需要注意的就是文件写入的时候，需要设置编码格式为二进制。才能写入成功。（可我之前的确写入成功了，没设置二进制）
运行截图
这里写图片描述
按提示输入关键字和下载数量（英文不知写的对不对）
输入完成后回车

两百张图片用时38秒。不知道快还是慢，不过这个速度不稳定，有时候快，有时候慢。不过幸运的事，每一张图片都保存了
不过有点稍微少儿不宜

这里写图片描述

第一次写爬虫了，算是记录一下入门，接着写点有意思的
入门的代码，我觉得也没人会看吧。。。

阅读全文

0 0