Learn Web.Crawling of Perl
来源:互联网 发布:梓潼淘宝运营招聘 编辑:程序博客网 时间:2024/06/03 10:58
######Overview of Web.Crawling related modules.#Note that, below codes can not be executed just for overview intention.######!/usr/bin/perl######HTTP::Thin#####use 5.12.1;use HTTP::Request::Common;use HTTP::Thin;say HTTP::Thin->new()->request(GET 'http://example.com')->as_string;######HTTP:Tiny#####use HTTP::Tiny;my $response = HTTP::Tiny->new->get('http://example.com/');die "Failed! \n" unless $response->{success};print "$response->{status} $response->{reason} \n";while (my ($k, $v) = each %{$response->{headers}}) { for (ref $v eq 'ARRAY' ? @$v : $v) { print "$k: $_ \n"; }}print $response->{content} if length $response->{content};#new$http = HTTP::Tiny->new{ %attrubutes };#valid attributes include:#-agent#-cookie_jar#-default_headers#-local_address#-keep_alive#-max_redirect#-max_size#-https_proxy#-proxy#-no_proxy#-timeout#-verify_SSL#-SSL_options#get[head][put][post]delete$response = $http->get($url);$response = $http->get($url, \%options);$response = $http->head($url);#post_form$response = $http->post_form($url, $form_data);$response = $http->post_form($url, $form_data, \%options);#request$response = $http->request($method, $url);$response = $http->request($method, $url, \%options);$http->request('GET', 'http://user:pwd hk.mars@aol.com');#or$http->request('GET', 'http://mars%40:pwd hk.mars@aol.com');#www_form_urlencode$params = $http->www_form_urlencode( $data );$response = $http->get("http://example.com/query?$params");#SSL supportSSL_options => { SSL_ca_file => $file_path,}#proxy support######www::Mechanize##Stateful programmatic web browsing, used for automating interaction with websites.#####use WWW::Mechanize;my $mech = WWW::Mechanize->new();$mech->get( $url );$mech->follow_link( n => 3 );$mech->follow_link( text_regex => qr/download this/i );$mech->follow_link( url => 'http://host.com/index.html' );$mech->submit_form( form_number => 3, fields => { username => 'banana', passoword => 'lost-and-alone', });$mech->submit_form( form_name => 'search', fields => { query => 'pot of gold', }, button => 'search now');#testing web applicationsuse Test::More;like( $mech->content(), qr/$expected/, "Got expected content" );#page traverse$mech->back();#finer control over page$mech->find_link( n => $number );$mech->form_number( $number );$mech->form_name( $name );$mech->field( $name, $value );$mech->set_fields( $field_values );$mech->set_visible( @criteria );$mech->click( $button );#subclass of LWP::UserAgent, eg:$mech->add_header( $name =>$value );#page-fecting methods#status methods#content-handling methods#link methods#image methods#form methods#field methods#miscellaneous methods#overridden LWP::UserAgent methods#inherited unchanced LWP::UserAgent methods#yeah now, it's easy to implement a spider project for future integration use.
>> More of Perl Web.Crawling
Mars
0 0
- Learn Web.Crawling of Perl
- Learn Web.Development of Perl
- Crawling Ajax-based Web Applications
- Crawling
- learn perl
- Having fun web crawling with phantomJs
- Detecting Near-Duplicates for Web Crawling
- Crawling World Wild Web at Scale
- 阅读笔记:Detecting Near-Duplicates for Web Crawling
- Learn Perl in Y minutes
- My learn of git
- Larbin learnin(3)——how to limit the scope of crawling
- Android learn web site
- Detecting Near-Duplicates for Web Crawling - simhash与重复信息识别
- Web Crawling and Data Miniing with Apache Nutch(翻译+学习心得)_01
- Detecting Near-Duplicates for Web Crawling - simhash与重复信息识别
- note of Perl (一)
- note of Perl (二)
- Linux使用模拟I2C
- ADuC7126学习(一):ADuC7126精密微控制器用keil4编写程序不能进入IRQ中断的问题
- 在cocos引擎中封装的集合类Vector
- 来京第十天
- RPC框架protobuf-rpc-pro 阻塞和非阻塞实例
- Learn Web.Crawling of Perl
- 选拨管理者的一个必要条件
- layout
- /bin、/sbin、/usr/bin、/usr/sbin
- 马云台大演讲:这世界观点很多 请坚持自己思考
- 如何在Ubuntu上安装及简单配置SVN
- Windows客户端连接不上Linux服务器
- LA 3026(Period-MP算法)[Template:KMP]
- 开源Math.NET基础数学类库使用(08)C#进行数值积分