scrapy当当当当 连衣裙分类
来源:互联网 发布:php curl获取不到数据 编辑:程序博客网 时间:2024/04/25 18:41
scrapy startproject dangdangscrapy genspider -t basic cao dangdang.com
# -*- coding: utf-8 -*-# Define here the models for your scraped items## See documentation in:# http://doc.scrapy.org/en/latest/topics/items.htmlimport scrapyclass DangdangItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() #name 表示商品名称 name = scrapy.Field() # 商品价格 price = scrapy.Field() # 商品链接地址 link = scrapy.Field()
# -*- coding: utf-8 -*-import scrapyfrom dangdang.items import DangdangItemfrom scrapy.http import Requestimport jsonclass CaoSpider(scrapy.Spider): name = "cao" allowed_domains = ["dangdang.com"] start_urls = ( 'http://category.dangdang.com/pg1-cid4008149.html', ) def parse(self, response): item = DangdangItem() item['name'] = response.xpath("//p[@class='name']/a[@name='itemlist-title']/text()").extract() item['price'] = response.xpath("//p[@class='price']/span[@class='price_n']/text()").extract() item['link'] = response.xpath('//a[@class="pic"]/@href').extract() # 提取完成后返回item yield item for i in range(1,10): url = "http://category.dangdang.com/pg"+str(i)+"-cid4008149.html" yield Request(url,callback=self.parse) for i in range(100): print(item['name'][i]+'----' +item['price'][i] + '---' + item['link'][i])
pipelines.py 存储本地json文件
# 修改settings.pyITEM_PIPELINES = { 'dangdang.pipelines.DangdangPipeline': 300,}
# -*- coding: utf-8 -*-# Define your item pipelines here## Don't forget to add your pipeline to the ITEM_PIPELINES setting# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.htmlimport codecsimport jsonclass DangdangPipeline(object): def __init__(self): self.file = codecs.open("./data.json",'wb',encoding="utf-8") def process_item(self, item, spider): i = json.dumps(dict(item),ensure_ascii=False) line = i + '\n' self.file.write(line) return item def close_spider(self,spider): self.file.close()
阅读全文
0 0
- scrapy当当当当 连衣裙分类
- 告别当当
- 当当上市
- 当当总结
- 当当网
- Redis-Scrapy分布式爬虫:当当网图书为例
- 当当网窘境:快电商容不下慢当当
- 当当网站出问题了!
- 当当网并不恶心
- 当当网策划分析
- 又谈当当
- 当当网之缺点
- 当当网购物上当
- 快快乐乐,简简当当
- 当当李国庆
- 当当网被曝安全漏洞
- 当当网退货流程
- 当当网垃圾
- windows git 服务器配置
- hihocoder 1290 Demo Day 二维图的dp
- 最短路径简单题 (主要是bellman+dijkstra)
- android开发流程说明
- linux whereis
- scrapy当当当当 连衣裙分类
- 动态规划----判定性问题
- 51nod1091
- 线程1
- 深入理解SQL的四种连接-左外连接、右外连接、内连接、全连接
- hdu 2665(主席树查询区间k大值)
- POJ 1140 Expanding Fractions 笔记
- session清除数据和添加数据
- 【Redis笔记-4】Redis数据类型代码实践