【Python】抓取京东列表页商品信息(selenium)

来源:互联网 发布:imp oracle 导入数据 编辑:程序博客网 时间:2024/05/12 23:35

分析

  • url:https://search.jd.com/Search?keyword=%E6%89%8B%E6%9C%BA&enc=utf-8&qrst=1&rt=1&stop=1&vt=2&wq=%E6%89%8B%E6%9C%BA&cid2=653&cid3=655&page=1&s=1&click=0
  • 每页的商品共有60个,页面先加载前30个,下滑滚动时再加载后30个
  • 故用selenium模拟浏览器下滑操作,再将页面源码给bs4进行解析抽取

代码

# -*- coding: utf-8 -*-import timefrom selenium import webdriverfrom bs4 import BeautifulSoupurl = "https://search.jd.com/Search?keyword=%E6%89%8B%E6%9C%BA&enc=utf-8&qrst=1&rt=1&stop=1&vt=2&wq=%E6%89%8B%E6%9C%BA&cid2=653&cid3=655&page=1&s=1&click=0"driver = webdriver.Firefox()driver.implicitly_wait(3)driver.get(url)# 模拟下滑到底部操作for i in range(1, 5):    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")    time.sleep(1)# 将加载好的页面源码给bs4解析soup = BeautifulSoup(driver.page_source, "html.parser")# 进行信息的抽取(商品名称,价格)goods_info = soup.select(".gl-item")for info in goods_info:    title = info.select(".p-name.p-name-type-2 a")[0].text.strip()    price = info.select(".p-price")[0].text.strip()    print title    print pricedriver.close()

效果

阅读全文
1 0