随笔1-scrapy
来源:互联网 发布:俞兆林内裤怎么样知乎 编辑:程序博客网 时间:2024/06/10 05:40
- Creating a new Scrapy project
- Writing a spider to crawl a site and extract data
- Exporting the scraped data using the command line
- Changing spider to recursively follow links
- Using spider arguments
1.创建一个scrapy的项目
scrapy startproject tutorial
2.写一个爬虫,用来爬取网站和扩展数据
quotes_spider.py
3.使用命令行导出爬取的数据
scrapy crawl quotes
4.修改爬虫迭代链接(???)
5.使用爬虫参数(????)
scrapy shell 'http://quotes.toscrape.com/page/1/'
爬虫代码:
import scrapyclass QuotesSpider(scrapy.Spider): name = "quotes" def start_requests(self): url = 'http://quotes.toscrape.com/' tag = getattr(self, 'tag', None) if tag is not None: url = url + 'tag/' + tag yield scrapy.Request(url, self.parse) def parse(self, response): for quote in response.css('div.quote'): yield { 'text': quote.css('span.text::text').extract_first(), 'author': quote.css('small.author::text').extract_first(), } next_page = response.css('li.next a::attr(href)').extract_first() if next_page is not None: yield response.follow(next_page, self.parse)
阅读全文
0 0
- 随笔1-scrapy
- scrapy-1-初窥scrapy
- scrapy-1
- 《Learning Scrapy》1 Scrapy介绍
- 随笔1
- 随笔1
- 随笔1
- 随笔1
- 随笔 1
- 随笔1
- 随笔1
- 随笔1
- 随笔1
- 随笔1
- 随笔1
- 随笔1
- 随笔1
- 随笔1
- pyspider的搭建及爬取时遇到的坑
- JOptionPane弹框常用实例
- 为什么单精度浮点数的阶码取值范围是1-254
- 磁盘调度算法
- 探索List接口
- 随笔1-scrapy
- POJ 3020:Antenna Placement(最小路径覆盖)
- 类的序列化
- 1023. 组个最小数 (20)
- UVA 10026
- MapReduce II
- with check option
- E
- LeetCode算法问题3 —— Median of Two Sorted Arrays