python-scrapy 实现对豆瓣电影的爬取
来源:互联网 发布:嗯淘宝网店教学视频 编辑:程序博客网 时间:2024/06/03 07:25
#coding=utf-8
'''
PyTools:PyCharm 2017.1
Python :Python3.5
Author :colby_chen
CreDate:2017-04-13
'''
from scrapy.spiders import CrawlSpider
from scrapy.http import Request
from scrapy.selector import Selector
from douban.items import doubanItem
'''爬取准备
*目标网站:豆瓣电影TOP250
*目标网址:http://movie.douban.com/top250
*目标内容:
*豆瓣电影TOP250部电影的以下信息
*电影名称
*电影信息
*电影评分
*输出结果:生成csv文件
'''
class Douban(CrawlSpider):
name = "doubanMovie"
redis_key='douban:start_urls'
start_urls=['http://movie.douban.com/top250']
url='http://movie.douban.com/top250'
def parse(self,response):
item=doubanItem()
selector=Selector(response)
Movies=selector.xpath('//div[@class="info"]')
print('Movies',Movies)
for eachMoive in Movies:
print('eachMoive',eachMoive)
title=eachMoive.xpath('div[@class="hd"]/a/span/text()').extract()
fullTitle=''
print('title',title)
for each in title:
fullTitle+=each
print('eachtitle', each)
movieInfo=eachMoive.xpath('div[@class="bd"]/p/text()').extract()
star=eachMoive.xpath('div[@class="bd"]/div[@class="star"]/span[@class="rating_num"]/text()').extract()[0]
quote=eachMoive.xpath('div[@class="bd"]/p[@class="quote"]/span/text()').extract()
if quote:
quote=quote[0]
else:
quote=''
print('fullTitle',fullTitle)
print('movieInfo', movieInfo)
print('star', star)
print('quote', quote)
item['title']=fullTitle
item['movieInfo'] = ';'.join(movieInfo)
item['star'] = star
item['quote'] = quote
yield item
nextLink=selector.xpath('//span[@class="next"]/link/@href').extract()
if nextLink:
nextLink=nextLink[0]
print(nextLink)
yield Request(self.url+nextLink,callback=self.parse)
'''
PyTools:PyCharm 2017.1
Python :Python3.5
Author :colby_chen
CreDate:2017-04-13
'''
from scrapy.spiders import CrawlSpider
from scrapy.http import Request
from scrapy.selector import Selector
from douban.items import doubanItem
'''爬取准备
*目标网站:豆瓣电影TOP250
*目标网址:http://movie.douban.com/top250
*目标内容:
*豆瓣电影TOP250部电影的以下信息
*电影名称
*电影信息
*电影评分
*输出结果:生成csv文件
'''
class Douban(CrawlSpider):
name = "doubanMovie"
redis_key='douban:start_urls'
start_urls=['http://movie.douban.com/top250']
url='http://movie.douban.com/top250'
def parse(self,response):
item=doubanItem()
selector=Selector(response)
Movies=selector.xpath('//div[@class="info"]')
print('Movies',Movies)
for eachMoive in Movies:
print('eachMoive',eachMoive)
title=eachMoive.xpath('div[@class="hd"]/a/span/text()').extract()
fullTitle=''
print('title',title)
for each in title:
fullTitle+=each
print('eachtitle', each)
movieInfo=eachMoive.xpath('div[@class="bd"]/p/text()').extract()
star=eachMoive.xpath('div[@class="bd"]/div[@class="star"]/span[@class="rating_num"]/text()').extract()[0]
quote=eachMoive.xpath('div[@class="bd"]/p[@class="quote"]/span/text()').extract()
if quote:
quote=quote[0]
else:
quote=''
print('fullTitle',fullTitle)
print('movieInfo', movieInfo)
print('star', star)
print('quote', quote)
item['title']=fullTitle
item['movieInfo'] = ';'.join(movieInfo)
item['star'] = star
item['quote'] = quote
yield item
nextLink=selector.xpath('//span[@class="next"]/link/@href').extract()
if nextLink:
nextLink=nextLink[0]
print(nextLink)
yield Request(self.url+nextLink,callback=self.parse)
阅读全文
0 0
- python-scrapy 实现对豆瓣电影的爬取
- Scrapy 爬取 豆瓣电影的短评
- scrapy爬取豆瓣电影
- 用Scrapy对豆瓣top250进行电影详细信息爬取
- Python 采用Scrapy爬虫框架爬取豆瓣电影top250
- Python爬虫实战:Scrapy豆瓣电影爬取
- Python Scrapy(2)-爬取豆瓣电影详解
- scrapy爬取豆瓣TOP250电影
- scrapy ------ 爬取豆瓣电影TOP250
- scrapy爬取豆瓣top250电影
- Python爬取豆瓣电影
- Python爬取豆瓣电影
- Python爬取豆瓣电影
- 【scrapy】scrapy按分类爬取豆瓣电影基础信息
- 爬取豆瓣的电影
- scrapy爬豆瓣电影
- python 爬虫学习三(Scrapy 实战,豆瓣爬取电影信息)
- 爬虫框架scrapy,爬取豆瓣电影top250
- 1026
- Xcode Missing file的解决方案
- 2017多校联合第九场/hdu6162(树链剖分)
- 门面模式的理解
- Listener监听器以及小案例
- python-scrapy 实现对豆瓣电影的爬取
- 《看透springMvc源代码分析与实践》笔记1网站架构演变
- c++ 流传输
- UVA12661FunnyCarRacing
- 登录失败次数限制(原生php代码实现)
- bootStrap异步加载数据(动态加载数据)一二级菜单点击失效的解决办法
- Mysql导入\导出数据
- 关于计算机网络的常见面试问题
- [日推荐]『小幸运商店』解忧杂货铺,专治不开心