scrapy 爬虫入门

来源:互联网 发布:历史 知乎 编辑:程序博客网 时间:2024/05/16 17:28

1.安装scrapy 环境

cmd 命令执行conda install scrapy 即可

2.创建项目

scrapy startproject spider_name

3.构建爬虫(一个工程中可以存在多个spider, 但是名字必须唯一(进入到E:\spider_name\spider_name\spiders再构建))

scrapy genspider  garlic http://www.51garlic.com/hq/list-139.html

4.查看当前项目内有多少爬虫

scrapy list

5.执行爬虫

scrapy crawl garlic -o abc.csv

6.编写的爬虫代码garlic.py

# -*- coding: utf-8 -*-import scrapyclass GarlicSpider(scrapy.Spider):name = "garlic"start_urls=["http://www.51garlic.com/hq/list-139.html",             "http://www.51garlic.com/hq/list-139-2.html",]def parse(self, response):for href in response.css('.td-lm-list a::attr(href)'):full_url = response.urljoin(href.extract())yield scrapy.Request(full_url,callback=self.parse_question)def parse_question(self, response):yield {'title':response.css('.td-timu').extract()[0].encode('utf-8'),'txt':response.css('.td-nei-content').extract()[0].encode('utf-8'),'link': response.url,}


原创粉丝点击