python爬取性感美女图片

来源：互联网发布：侏罗纪世界4 知乎编辑：程序博客网时间：2024/05/01 21:33

需求：最近对python爬虫感兴趣，于是也依葫芦画瓢试着用爬虫爬取之前喜欢的网站上的美女图片，网站：http://www.mm131.com/xinggan,其中每一套图都是一张一个页面，存一套图如果是手动得点翻几十个页面，但现在用爬虫的话，就很方便了，只需输入套图的id，轻轻松松就可以把美女存到硬盘了。

大神说：talk is cheap show me the code!

接下来说下一般网页爬虫的的过程

1.查看目标网站页面的源代码，找到需要爬取的内容
2.用正则或其他如xpath/bs4的工具获取爬取内容
3.写出完整的python代码，实现爬取过程

1.目标网址

url：http://www.mm131.com/xinggan/2373.html
美女图片
漂亮吧！！

2.分析源代码

F12可以找到如下2行内容

src="http://img1.mm131.com/pic/2373/1.jpg"span class="page-ch">共56页

我们得到如下信息

第一页的url为http://www.mm131.com/xinggan/2373.html

第一行是第一页图片的的url，其中2373是套图的id

第二行看到这个套图有56张

我们点击第二页和第三页继续看源码

第二页和第三页的url为http://www.mm131.com/xinggan/2373_2.html 2373_3.html

图片url和第一页类似，1.jpg变成2.jpg

3.爬取图片

我们试着爬取第一个页面的图,直接上代码：

import requestsimport reurl = 'http://www.mm131.com/xinggan/2373.html'html = requests.get(url).text           #读取整个页面为文本a = re.search(r'img alt=.* src="(.*?)" /',html,re.S)  #匹配图片urlprint(a.group(1))</code>得到：http://img1.mm131.com/pic/2373/1.jpg

接下来我们需要把图片保存在本地：

pic= requests.get(a, timeout=2)  #time设置超时，防止程序苦等fp = open(pic,'wb')    #以二进制写入模式新建一个文件fp.write(pic.content)  #把图片写入文件fp.close()

这样，你的本地就会有第一张美女图了，

第一张既然已经保存了，那剩下的也都不要放过，继续放代码：

4.继续把代码补全

载入所需模块，并设置图片存放目录

#coding:utf-8import requestsimport reimport osfrom bs4 import BeautifulSouppic_id = raw_input('Input pic id: ')os.chdir("G:\pic")homedir = os.getcwd()print("当前目录 %s" % homedir )fulldir = unicode(os.path.join(homedir,pic_id),encoding='utf-8')  #图片保存在指定目录,并根据套图id设置目录if not os.path.isdir(fulldir):    os.makedirs(fulldir)

因为需要不停翻页才能获取图片，所以我们先获取总页数

url='http://www.mm131.com/xinggan/%s.html' % pic_idhtml = requests.get(url).text#soup = BeautifulSoup(html)soup = BeautifulSoup(html, 'html.parser')  #使用soup取关键字，上一行会报错UserWarning: No parser was explicitly specifiedye = soup.span.stringye_count = re.search('\d+',ye)print('pages：共%d页' % int(ye_count.group()))

主函数

def downpic(pic_id):    n = 1    url='http://www.mm131.com/xinggan/%s.html' % pic_id    while n <= int(ye_count.group()):  #翻完停止        #下载图片        try:            if not n == 1:                url='http://www.mm131.com/xinggan/%s_%s.html' % (pic_id,n) #url随着n的值变化的            html = requests.get(url).text            pic_url = re.search(r'img alt=.* src="(.*?)" /',html,re.S)   #使用正则去关键字            pic_s = pic_url.group(1)            print(pic_s)            pic= requests.get(pic_s, timeout=2)            pic_cun = fulldir + '\\' + str(n) + '.jpg'            fp = open(pic_cun,'wb')            fp.write(pic.content)            fp.close()            n += 1        except requests.exceptions.ConnectionError:            print("【错误】当前图片无法下载")            continueif __name__ == '__main__':    downpic(pic_id)

程序跑起来

python

5.好了，收工

看着硬盘里的图片是不是爽歪歪，当然爬虫能干的不光只是下图片，它还可以做其他一些事，比如爬取12306火车信息，或求职网的职位信息，或者其他，总之赶紧把此技能get起来，丰富起来吧！

参考：http://www.jianshu.com/p/19c846daccb3

0 0