爬虫笔记(10/6)--------CSVFeedSpider
来源:互联网 发布:阿里云搭ss 编辑:程序博客网 时间:2024/06/16 00:11
1.下载一个csv文件:http://yum.iqianyue.com/weisuenbook/pyspd/part12/mydata.csv
2.创建项目mycsv
..............myfirstspjt>scrapy startproject mycsv
3.修改items文件:创建name用来存储名字信息,sex用来存储性别
# -*- coding: utf-8 -*-# Define here the models for your scraped items## See documentation in:# http://doc.scrapy.org/en/latest/topics/items.htmlimport scrapyclass MycsvItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() name = scrapy.Field() sex = scrapy.Field() pass4.在cmd中列出可用爬虫模板
scrapy genspider -l
5.创建csvfeed爬虫模板,处理csv文件
scrapy genspider -t csvfeed mycsvspider iqianyue.com
6.更改文件mycsvspider.py
# -*- coding: utf-8 -*-from scrapy.spiders import CSVFeedSpiderfrom mycsv.items import MycsvItemclass MycsvspiderSpider(CSVFeedSpider): name = 'mycsvspider' allowed_domains = ['iqianyue.com'] # start_urls = ['http://www.iqianyue.com/feed.csv'] start_urls = ['http://yum.iqianyue.com/weisuenbook/pyspd/part12/mydata.csv'] #定义headers headers = ['name','sex','addr','email'] #定义间隔符 delimiter = ',' # headers = ['id', 'name', 'description', 'image_link'] # delimiter = '\t' # Do any adaptations you need here #def adapt_response(self, response): # return response def parse_row(self, response, row): i = MycsvItem() i['name'] = row['name'].encode() i['sex'] = row['sex'].encode() print("名字是:") print(i['name']) print("性别是:") print(i['sex']) print("-------------------------------") #i['url'] = row['url'] #i['name'] = row['name'] #i['description'] = row['description'] return i7.cmd执行文件
scrapy crawl mycsvspider --nolog
阅读全文
0 0
- 爬虫笔记(10/6)--------CSVFeedSpider
- 爬虫笔记(10/2)------爬虫框架
- 爬虫笔记(10/6)-------多开技能
- 五.scrapy CSVFeedSpider
- Scrapy笔记(10)- 动态配置爬虫
- python 爬虫笔记(二)
- Python 爬虫笔记(三)
- Python 爬虫笔记(CrawlingwithScrapy)
- python爬虫笔记(三)
- 【爬虫笔记】爬虫入门
- Scrapy爬虫笔记【6-连接数据库(一)】
- 爬虫笔记(10/1)--------http.cookiejar模块
- 爬虫笔记(10/2)------定向爬取
- 爬虫笔记(10/4)-------scrapy项目管理
- 爬虫笔记(10/7)-----------避免被禁止
- 爬虫笔记(10/9)-------scrapy核心架构
- 爬虫笔记
- 爬虫笔记
- Mongodb常见的问题
- Salary Inequity
- YUV基础知识
- PyQt5学习笔记1_第一个QML+PyQt程序
- 如何修改Linux的主机名
- 爬虫笔记(10/6)--------CSVFeedSpider
- UVA
- Java提高篇(5)-对象比较器
- LiteIDE
- 《C++ 继承篇 imooc》笔记
- Linux可变参数
- Codeforces Round #438 题解
- Azkaban安装
- 51nod 1158 全是1的最大子矩阵