随笔1-scrapy

来源：互联网发布：俞兆林内裤怎么样知乎编辑：程序博客网时间：2024/06/10 05:40

Creating a new Scrapy project
Writing a spider to crawl a site and extract data
Exporting the scraped data using the command line
Changing spider to recursively follow links
Using spider arguments

1.创建一个scrapy的项目

scrapy startproject tutorial

2.写一个爬虫，用来爬取网站和扩展数据

quotes_spider.py

3.使用命令行导出爬取的数据

scrapy crawl quotes

4.修改爬虫迭代链接（？？？）

5.使用爬虫参数（？？？？）

scrapy shell 'http://quotes.toscrape.com/page/1/'

爬虫代码：

import scrapyclass QuotesSpider(scrapy.Spider):    name = "quotes"    def start_requests(self):        url = 'http://quotes.toscrape.com/'        tag = getattr(self, 'tag', None)        if tag is not None:            url = url + 'tag/' + tag        yield scrapy.Request(url, self.parse)    def parse(self, response):        for quote in response.css('div.quote'):            yield {                'text': quote.css('span.text::text').extract_first(),                'author': quote.css('small.author::text').extract_first(),            }        next_page = response.css('li.next a::attr(href)').extract_first()        if next_page is not None:            yield response.follow(next_page, self.parse)

阅读全文

0 0