爬虫04 爬取糗事百科中文段子

来源:互联网 发布:风油精不可描述知乎 编辑:程序博客网 时间:2024/06/05 11:01
# -*- coding: utf-8 -*-import urllibimport urllib2import repage = 1url = 'http://www.qiushibaike.com/8hr/page/%d/?s=4908781' %pageuser_agent="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0"headers = { 'User-Agent' : user_agent }request = urllib2.Request(url,headers=headers)response = urllib2.urlopen(request)back=response.read()#print backimglist=re.findall(r'<div[^>]class="content">\n\n([^<]+)<[^>]+.+\n\n[^<]',back)print imglistf = open('糗事百科'+str(page)+'.txt', 'w')for joke in imglist:    f.write(joke)
0 0
原创粉丝点击