selenium+chromedriver安装及简单使用

来源:互联网 发布:淘宝网家纺四件套 编辑:程序博客网 时间:2024/05/16 19:34


一. 安装 selenium

pip install selenium

二. 安装 chromedriver

https://sites.google.com/a/chromium.org/chromedriver/downloads

解压后里面是一个exe文件,有两种选择:

 1.每次使用手动增加路径:

#手动添加路径path = "C:\Program Files (x86)\Google\Chrome\Application\chromedriver\chromedriver.exe"driver = webdriver.Chrome(executable_path=path)

2. 添加环境变量path,增加chromedriver所在的目录


三. 简单实现,爬取火猫TV的主播节目数据

from selenium import webdriverfrom bs4 import BeautifulSoupfrom pandas import DataFrameimport time#手动添加路径path = "C:\Program Files (x86)\Google\Chrome\Application\chromedriver\chromedriver.exe"driver = webdriver.Chrome(executable_path=path)url = "https://www.huomao.com/channel/lol"#司机开车了driver.get(url) #让页面移到最下面点击加载,连续6次,司机会自动更新!!for i in range (6):    driver.find_element_by_id("getmore").click()    time.sleep(1)#开始解析   soup = BeautifulSoup(driver.page_source,"html.parser")page_all = soup.find("div",attrs={"id":"channellist"})pages = page_all.find_all("div",attrs={"class":"list-smallbox"})name =[]title =[]watching =[]for page in pages:    tag = False    try:            this_title = page.find("div",attrs={"class":"title-box"}).find("em").string.strip()        temp = page.find_all("p")        this_name = temp[1].find("span").string.strip()        this_watching = temp[1].find_all("span")[1].string.strip()        tag = True        if tag:            title.append(this_title)            name.append(this_name)            watching.append(this_watching)    except:        continue        result = DataFrame({        "主播名":name,        "节目名":title,        "在线观看人数":watching        })    #没有文件会自动创建result.to_excel("E:\\resultLol.xlsx",sheet_name = "Sheet1")





0 0
原创粉丝点击