抓取搜狗壁纸公园的图片(一)
来源:互联网 发布:sql server企业版安装 编辑:程序博客网 时间:2024/04/28 16:04
利用urllib2和beautifulsoup抓取搜狗壁纸公园的图片并下载...核心代码...后续会继续优化...
http://bizhi.sogou.com/park
#-*-coding:utf-8-*-'''Created on 2016-3-15@author: 201507220131'''from bs4 import BeautifulSoupfrom bs4 import elementimport urllib2import reclass SougouPic(): ''' 抓取搜狗壁纸 ''' def __init__(self): self.Baseurl = 'http://bizhi.sogou.com' self.itemsList = [] self.nameIndex = 0 def getBaseUrl(self): return self.Baseurl #获取网页代码,默认抓取首页代码 def getCode(self,url=None): if url: request = urllib2.Request(self.Baseurl+url) else: request = urllib2.Request(self.Baseurl+'/park') response = urllib2.urlopen(request) return response.read().decode('gbk') #抓取分类信息 def getClassfiy(self,code): pattern1 = re.compile('<div.*?class="class_alta.*?<span class="white_font">(.*?)</span></a>.*?',re.S) pattern2 = re.compile('<div class="tag_mid font_b3d465">.*?f=nav">(.*?)</a>',re.S) items1 = re.findall(pattern1,code) return items1 #抓取图片分类信息 def getClassfiy2(self,code): soup = BeautifulSoup(code) items1_Code = soup.find_all(class_ = 'white_font') for index,item in enumerate(items1_Code): childList = [] childDic = {} if index == 0: divcode = soup.find_all(class_='class_more_side class_more_side_all') for i in divcode: for c in i.children: if type(c) is element.Tag: childDic = {'name':c.string,'url':c.get('href'),'child':''} childList.append(childDic) else: divCode = item.previous_element.next_sibling.next_element.next_element.next_element for i in divCode: if type(i) is element.Tag: childDic = {'name':i.string,'url':i.get('href'),'child':''} childList.append(childDic) dic = {'name':item.string,'url':item.previous_element.get('href'),'child':childList} self.itemsList.append(dic) return self.itemsList #获取图片链接 def getPicUrl(self,code): soup = BeautifulSoup(code) items = soup.find_all(class_='wallpaper_dis') picUrlList = [] for i in items: img = i.next_element.next_element.next_element.next_element picUrlList.append(img.get('src')) return picUrlList #下载图片 def downLoad(self,urlList): for url in urlList: data = urllib2.urlopen(url) file = open('pic/pic'+str(self.nameIndex)+'.jpg','wb') file.write(data.read()) file.close() self.nameIndex += 1sougou = SougouPic()homeCode = sougou.getCode()itemsList = sougou.getClassfiy2(homeCode)#print itemsListbaseUrl = sougou.getBaseUrl()for item in itemsList: print item.get('name').decode('utf-8'),'#########################' childs = item.get('child') print '子分类:' for child in childs: print child.get('name').decode('utf-8'),':',baseUrl+child.get('url')picCode = sougou.getCode('/label/index/588?f=popup')items = sougou.getPicUrl(picCode)for i in items: print '############' print isougou.downLoad(items)
0 0
- 抓取搜狗壁纸公园的图片(一)
- 抓取搜狗壁纸公园的图片(二)
- 抓取搜狗壁纸公园的图片(三)
- 抓取搜狗壁纸公园的图片(四)改用python3.5下载真正的壁纸
- Scrapy抓取壁纸图片
- 公园里的一幕?
- 抓取搜狗图片
- 抓取娟娟壁纸网的scrapy爬虫
- 修改第一次开机时的默认壁纸(静态图片和动态壁纸)
- python爬虫(一)抓取 色影无忌图片
- Android 系统默认壁纸(静态图片和动态壁纸)
- 设置图片为壁纸的源代码
- DirectShow:图片的抓取
- DirectShow图片的抓取
- 对公园里一细节的见微知著的分析猜想
- 网页内容抓取 图片的抓取方法
- 以简单插件形式写的一个图片(壁纸)下载工具
- winform壁纸工具:为图片添加当月的日历并设为壁纸
- maven未设置HTTP代理报错
- 链表回文——《编程之法》课后题答案
- html5响应式布局案例
- Harmonic Number 调和级数
- Android 数据绑定框架DataBinding,堪称解决界面逻辑的黑科技
- 抓取搜狗壁纸公园的图片(一)
- NHibernate 映射 SqlServer 中 Image 字段
- java.net.ProtocolException:Too many follow-up requests:21
- HDU 4597 Play Game(DFS,区间DP)
- Eclipse如何修改dynamic web module version
- javaScript 小例子
- 游戏场景内放置图标Gizmos.DrawIcon()
- ZKclient 不断重连问题排查-dubbo使用
- C#基本练习