抓取梦幻西游藏宝阁数据,php过验证

来源:互联网 发布:php开源餐饮管理系统 编辑:程序博客网 时间:2024/05/01 07:23
  public function login_cbg(){
$cookieVerify = dirname(__FILE__)."/cookie.cookie";
$cookieSuccess = dirname(__FILE__)."/cookie_2.cookie";
if(!$_POST){
// 获取cookie并保存
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, "http://xyq.cbg.163.com/cgi-bin/login.py?next_url=%2Fcgi-bin%2Fequipquery.py%3Fact%3Dbuy_show_by_ordersn%26ordersn%3D22_1458634981_24342109%26server_id%3D9&server_id=9&act=show_anon_auth_page");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieVerify);
$rs = curl_exec($ch);
curl_close($ch); 
 
// 带上cookie抓取验证码,必须带上cookie,否则验证码不对应
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, "http://xyq.cbg.163.com/cgi-bin/create_validate_image.py?");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieVerify);
curl_setopt($ch, CURLOPT_COOKIEJAR,$cookieVerify);
$rs = curl_exec($ch);
// 把验证码在本地生成,二次拉取验证码可能无法通过验证
@file_put_contents("verifycode.jpg",$rs);
curl_close($ch); 
// 手工验证码表单
echo "<form action=\"\" method=\"post\"><input type=\"text\" name=\"vcode\"><img src=\"verifycode.jpg\" /><br><input type=\"submit\" value=\"ok\"></form>";
}else{
// 登录
$ch = curl_init(); 
// 用户名\密码 
$user = "abc123"; 
$pass = "123456";
$verify = $_POST["vcode"];
$url = "http://xyq.cbg.163.com/cgi-bin/login.py"; 
$next_url="/cgi-bin/equipquery.py?act=buy_show_by_ordersn&ordersn=22_1458634981_24342109&server_id=9";
 
// 返回结果存放在变量中,不输出 
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieVerify);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch,CURLOPT_HTTPGET,1);
curl_setopt($ch,CURLOPT_REFERER,"http://xyq.cbg.163.com/cgi-bin/login.py?next_url=%2Fcgi-bin%2Fequipquery.py%3Fact%3Dbuy_show_by_ordersn%26ordersn%3D22_1458634981_24342109%26server_id%3D9&server_id=9&act=show_anon_auth_page"); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120); 
curl_setopt($ch, CURLOPT_POST, true); 
$fields_post = array("act"=>'do_anon_auth', 
"next_url"=>$next_url, 
"server_id"=>9,
"image_value"=>$verify); 
$headers_login = array("User-Agent" => "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.108 Safari/537.36"); 
// $fields_string = ""; 
// foreach($fields_post as $key => $value){ 
// $fields_string .= $key . "=" . $value . "&"; 
// } 
// $fields_string = rtrim($fields_string , "&"); 
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers_login); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieSuccess);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($fields_post)); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$result= curl_exec($ch);
$result=mb_convert_encoding($result,'utf-8','utf-8,ASCII,GBK,GB2312');
curl_close($ch);
$this->write_file($result);
//var_dump($result);

}

}


public function write_file($content){
$path=$_SERVER['DOCUMENT_ROOT'];
$path=$path.'/application/views/contents';
if(!file_exists($path)){
mkdir($path);
}
$myfile=fopen($path.'/mh.php','w');
fwrite($myfile, $content);
fclose($myfile);
}

原理就是第一次登陆时候获取的cookie,里面包含这验证码信息。使用这个cookie去获取验证码,然后手动输入,然后post给网站,就登陆成功了。(http是无状态的)


之后输出view/mh.php文件就能看到得到整个文件,至于显示的出错的原因: 

js文件内有些返回值是相对路径,那么get的路径就会错误,例如:如果我的本地配置域名是www.first.com,那么get的相对路径默认是我本地域名了,会导致get数据出错。解决办法是将本地域名配置成跟藏宝阁同一个域名,这样就会省去很多改动。

      还有一个显示乱码问题,看了很多天js代码,没有发现错误。最后,是将js文件的拷到本地,使用本地的js文件,这样中文显示乱码就解决掉了,原因还没有想通。


   

0 0
原创粉丝点击