Mechanize例子
来源:互联网 发布:淘宝折800报名要求 编辑:程序博客网 时间:2024/06/06 01:58
This list based on my (german) article: "Web scraping mit Ruby/Mechanize" http://sixserv.org/2009/05/27/webscripting-mit-ruby-und-mechanize/ #00 Initialization
require 'rubygems'require 'mechanize'agent = WWW::Mechanize.newagent.set_proxy('localhost', '8000')agent.user_agent = 'Individueller User-Agent'agent.user_agent_alias = 'Linux Mozilla'agent.open_timeout = 3agent.read_timeout = 4agent.keep_alive = falseagent.max_history = 0 # reduce memory if you make lots of requests#01 manual get requestsurl = 'http://apoc.sixserv.org/requestinfo/'page = agent.get url# or ...page = agent.get(url, {"name" => "value", "key" => "val"})#02 manual post submits
url = 'http://apoc.sixserv.org/requestinfo/'page = agent.post(url, {"name" => "value", "key" => "val"})
#03 form post submitspage = agent.get 'https://twitter.com/login'login_form = page.form_with(:action => 'https://twitter.com/sessions')login_form['session[username_or_email]'] = '[Username]'login_form['session[password]'] = '[Password]'page = agent.submit login_form#04 link and history navigationpage = agent.get 'http://www.heise.de/'page = agent.click(page.link_with(:text => /Telepolis/))page = agent.click(page.link_with(:href => /artikel/))agent.backagent.backputs page.body#05 exceptionsbegin page = agent.get 'http://apoc.sixserv.org/diese/seite/gibt/es/nicht/'rescue WWW::Mechanize::ResponseCodeError puts "ResponseCodeError - Code: #{$!}"end#06 refererpage = agent.get(:url => 'http://apoc.sixserv.org/requestinfo/',:referer => 'http://google.com/this/is/a/custom/referer')puts page.body#07 request header manipulationagent.pre_connect_hooks << lambda do |params| params[:request]['X-Requested-With'] = 'XMLHttpRequest'end#08 response headerpage = agent.head 'http://sixserv.org'server_version = page.header['server']puts "Server: #{server_version}"if page.header.key? 'x-powered-by' php_version = page.header['x-powered-by'] puts "X-Powered-By: #{php_version}"end# redirection urls:agent.redirect_ok = falsepage = agent.get 'http://www.sixserv.org/'puts page.header['location']#09 content parsing# X Path / CSS-Selector:page = agent.get 'http://xkcd.com/'img = page.search '/html/body/div/div[2]/div/div[2]/div/div/img'puts img# Regular Expression:page = agent.get 'http://example.com/'page.body.match /< h3>([^<]+)< \/h3>/puts "Heading 3: #{$1}"#10 "with" method examples# *_with: form, link, base, frame or iframe# get the first link including "foo" inside url:page.link_with(:href => /foo/)# all links with text 'more'page.links_with(:text => 'more')# get the form with the name 'foo'page.form_with('foo') # or form_with(:name => 'foo')
0 0
- Mechanize例子
- mechanize
- mechanize 12306
- 安装mechanize
- Mechanize使用手册
- Mechanize Guide
- Mechanize&Selenium
- 安装mechanize,缺少libxslt
- python使用mechanize
- python 之mechanize
- Mechanize版本的指定
- ruby mechanize一些信息
- Python mechanize模块
- Python Mechanize Cheat Sheet
- mechanize登陆csdn
- mechanize javascript onclick
- multi-mechanize负载压力
- Mechanize模拟浏览器
- 搬家喽
- Swift-Excel转csv再转UTF-8
- Docker周报2015年1月(下)
- 翻译经典之《Cisco Lan Switching》第六章 理解生成树(二): 什么是生成树、为何要使用生成树?
- ITWorld:2014年全球最杰出的14位编程天才
- Mechanize例子
- Android NDK 学习(1) 搭建开发环境
- C#开发ActiveX插件
- windows及linux平台下安装配置memcached
- lib 和 dll 的区别、生成以及使用详解
- Android ADB 端口占用问题解决方案
- 20150209学习总结
- 【低功耗设计学习笔记】(二)By-passing & Clock Gating
- GPU虚拟化笔记