spider for douban
来源:互联网 发布:淘宝店铺运营公司 编辑:程序博客网 时间:2024/06/07 04:45
<span style="font-size:18px;">获取豆瓣排行榜内容,新片榜,口碑榜,北美榜</span>
<span style="font-size:18px;">#!/usr/bin/env python# coding:utf-8import requests,urllibimport re,osurl = r"https://movie.douban.com/chart"s = requests.Session()r2 = s.get(url) #session.get方法html = r2.text #requests获取网址内容速度远快于(urllib.urlopen).read()newnum = re.compile(r'<a class="nbg" href="https://movie.douban.com/subject/([0-9]+)/"') #新片榜newname = re.compile(r'.jpg" alt="(.*?)" class=""/>')kbnum = re.compile(r'mv_week.*?([0-9]+)/" class="">') #口碑榜bmnum = re.compile(r'mv_us_week.*?([0-9]+)/" class="">') #北美票房榜x = r'href="https://movie.douban.com/subject/.*?/" class="">(.*?)</a>' #两榜片名newvisit = []kbvisit = []f = open("豆瓣排行.txt",'w') #在本目录下打开文件,如果存在就创建newvisit.extend(newnum.findall(html)) #extend,以字符串形式的元素添加进list。append则是将list添加到listnewvisit.extend(newname.findall(html.encode('utf-8')))kbvisit.extend(kbnum.findall(html))kbvisit.extend(bmnum.findall(html))kbvisit.extend(re.findall(x,html.encode('utf-8'),re.S))f.write("豆瓣新片榜:\n")for a in (range(len(newvisit)/2)): f.write(newvisit[a]) f.write(newvisit[a+10]) f.write("\n") print newvisit[a],newvisit[a+10]f.write("\n本周豆瓣口碑榜,北美榜:\n")for b in (range(len(kbvisit)/2)): f.write(kbvisit[b]) f.write(kbvisit[b+20].strip()) f.write("\n") print kbvisit[b],kbvisit[b+20].strip()f.closenewpic = re.compile(r'<img src="(https://img.\.doubanio.com/view/movie_poster_cover/ipst/public/.*?\.jpg)" alt=') #图片解析,需要拿到图片存放的地址piclist1 = re.findall(newpic,html)picnum=0for c in piclist1: urllib.urlretrieve(c,"%s.jpg"%picnum) picnum+=1</span>
0 0
- spider for douban
- little spider for scala
- spider for qsbk
- spider for bdtb
- douban
- spider for doubantop250 -- scrapy框架
- Spider
- spider
- Spider
- spider
- spider
- spider
- Spider
- Spider
- Spider for UCI Machine Learning Repository
- douban笔记
- spider 设计~
- 开源spider
- CFS(完全公平调度)中的虚拟运行时间(vruntime)
- 自定义view(三)
- Android Studio 调试时 INSTALL_FAILED_INSUFFICIENT_STORAGE 错误的解决方法
- android自定义控件系列教程-----touch事件的传递
- 编写高性能的JavaScript事件(1)
- spider for douban
- 向ArcSDE导入数据出现(ORA-00001: Unique contraint (SDE.GDB_OC_PKC) violated错误
- POJ 3069 Saruman's Army - 贪心
- JDK的下载
- (十)boost库之多线程
- uLua记录--UILua
- 学习笔记:Spring Bean的作用域
- iOS添加字体汇总
- Java Web开发5___jQuery 中 按钮点击事件的几种写法