Python 爬虫入门 3 Selenium+Python

来源:互联网 发布:线性代数中矩阵求模 编辑:程序博客网 时间:2024/05/17 03:23

一、简介

Selenium是一个WEB自动化测试工具。

二、安装

环境:

  • Win7
  • Python2.7.13
  • pip
 pip install selenium 

或:下载地址:https://pypi.python.org/pypi/selenium/2.42.1
下载解压后使用命令:

python setup.py install

中文文档地址:
http://selenium-python-zh.readthedocs.io/en/latest/index.html

如果提示 Message: ‘geckodriver’ executable needs to be in PATH , 则下载driver:
https://github.com/mozilla/geckodriver/releases
放到Python所在的目录里。

chromedriver:
https://code.google.com/p/chromedriver/downloads/list
http://chromedriver.storage.googleapis.com/index.html

要注意的是Chrome 45以上版本不再支持Flash,如果测试带Flash的站点请使用45以下版本。

chromedriver版本 支持的Chrome版本 v2.30 v58-60 v2.29 v56-58 v2.28 v55-57 v2.27 v54-56 v2.26 v53-55 v2.25 v53-55 v2.24 v52-54 v2.23 v51-53 v2.22 v49-52 v2.21 v46-50 v2.20 v43-48 v2.19 v43-47 v2.18 v43-46 v2.17 v42-43 v2.13 v42-45 v2.15 v40-43 v2.14 v39-42 v2.13 v38-41 v2.12 v36-40 v2.11 v36-40 v2.10 v33-36 v2.9 v31-34 v2.8 v30-33 v2.7 v30-33 v2.6 v29-32 v2.5 v29-32 v2.4 v29-32

三、 一些方法

## 浏览器最大化driver.maximize_window()## 设置浏览器的高度为800像素,宽度为480像素driver.set_window_size(480, 800)## 浏览器后退driver.back()## 浏览器前进driver.forward()## 执行jsjs = 'selectTheme("red")'driver.execute_script(js)## 查找元素driver.find_element_by_id('menudoc').click()## 条件等待WebDriverWait(driver, 10).until(expected_conditions.element_to_be_clickable((By.ID, "submit")))

cookie操作

driver.add_cookie(            {'name': 'key-neeeeew', 'value': 'value-neeeewwwww'})# 遍历cookies 中的name 和value 信息打印,当然还有上面添加的信息for cookie in self.driver.get_cookies():    print("%s -> %s" % (cookie['name'], cookie['value']))    print()self.driver.delete_all_cookies()cookies = self.driver.get_cookies()print(cookies)

文本框

password_field = driver.find_element_by_name('password')password_field.clear()account_field.send_keys('demo') print(companyname.get_attribute('type'))

四、示例

起步1

# -*- coding:utf-8 -*-## 引入WebDriver的包from selenium import webdriver## 创建浏览器对象browser = webdriver.Firefox()## 打开百度网站browser.get('https://www.baidu.com/')

示例2

# -*- coding: UTF-8 -*-from selenium import webdriverfrom selenium.webdriver.common.keys import Keysbrowser = webdriver.Firefox()browser.get('https://www.baidu.com')assert u'百度一下,你就知道' in browser.titleelem = browser.find_element_by_name('wd')elem.send_keys('seleniumhq' + Keys.RETURN)browser.quit()

示例3

from selenium import webdriverfrom selenium.webdriver.common.desired_capabilities import DesiredCapabilitiesdcap = dict(DesiredCapabilities.PHANTOMJS)dcap["phantomjs.page.settings.userAgent"] = ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 ")obj = webdriver.PhantomJS(desired_capabilities=dcap)obj.implicitly_wait(5)obj.set_page_load_timeout(5)obj.maximize_window()try:    obj.get('https://www.baidu.com')    obj.save_screenshot("11.png")    print obj.find_element_by_id('cp').textexcept Exception as e:    print eobj.quit()

示例4

from selenium import webdriverdriver = webdriver.PhantomJS()driver.maximize_window()driver.get('http://wenshu.court.gov.cn/list/list/')data = driver.page_sourcedriver.find_element_by_id('search_form_input_homepage').send_keys("Nirvana")driver.find_element_by_id('search_button_homepage').click()driver.get_screenshot_as_file('show.png')print (data)driver.quit()

示例5

from selenium import webdriverfrom selenium.common.exceptions import NoSuchElementExceptionfrom selenium.webdriver.common.keys import Keysimport timebrowser = webdriver.Firefox() # Get local session of firefoxbrowser.get("http://www.yahoo.com") # Load pageassert "Yahoo!" in browser.titleelem = browser.find_element_by_name("p") # Find the query boxelem.send_keys("seleniumhq" + Keys.RETURN)time.sleep(0.2) # Let the page load, will be added to the APItry:    browser.find_element_by_xpath("//a[contains(@href,'http://seleniumhq.org')]")except NoSuchElementException:    assert 0, "can't find seleniumhq"browser.close()

截屏示例

# -*- coding: utf-8 -*-# from selenium import webdriverimport time  def capture(url, save_fn="capture.png"):      browser = webdriver.Firefox()     # Get local session of firefox      browser.set_window_size(1200, 900)      browser.get(url) # Load page      browser.execute_script("""            (function () {                  var y = 0;                  var step = 100;                  window.scroll(0, 0);                   function f() {                        if (y < document.body.scrollHeight) {                              y += step;                              window.scroll(0, y);                              setTimeout(f, 50);                        } else                 {                              window.scroll(0, 0);                              document.title += "scroll-done";                        }                  }            setTimeout(f, 1000);})();          """)    for i in xrange(30):            if "scroll-done" in browser.title:            break        time.sleep(1)    browser.save_screenshot(save_fn)      browser.close()  if __name__ == "__main__":       capture("http://www.sohu.com")

目前发现只能截一屏。
最大化

driver.set_window_size(1024, 600)driver.maximize_window()

参考

http://www.doc88.com/p-6621182114676.html
http://www.cnblogs.com/qytang/p/5542228.html
http://www.cnblogs.com/chenqingyang/p/3772673.html
http://www.jianshu.com/p/3ce95cbc65be