pyspider-例子之豆瓣top250

来源:互联网 发布:中国离婚率知乎 编辑:程序博客网 时间:2024/06/06 12:50

pyspider-例子之豆瓣top250

最近学习了pyspider,真是大神之作,简单易用,可视化UI,交互性不错,加之选择器的优秀耐用。使用它简单爬取了豆瓣网的top250,以下为代码:

代码块

#!/usr/bin/env python# -*- encoding: utf-8 -*-# Created on 2017-07-20 14:06:26# Project: douban# by:daiyangfrom pyspider.libs.base_handler import *import reclass Handler(BaseHandler):    crawl_config = {    }    @every(minutes=24 * 60)    def on_start(self):        self.crawl('http://movie.douban.com/top250', callback=self.index_page)    @config(age=10 * 24 * 60 * 60)    def index_page(self, response):        for each in response.doc('a[href^="http"]').items():            if re.match("https://movie.douban.com/top250", each.attr.href, re.U):                self.crawl(each.attr.href, callback=self.detail_page)             @config(priority=2)    def detail_page(self, response):        return {            "url": response.url,            "title": response.doc('html > body > div#wrapper > div#content > div.grid-16-8.clearfix > div.article > ol.grid_view > * > div.item > div.info > div.hd > a > span.title').text(),        }

目录

[TOC]来生成目录:

  • pyspider-例子之豆瓣top250
      • 代码块
      • 目录