网络爬虫抓包使用及通过表单请求

来源:互联网 发布:日本对外贸易数据2015 编辑:程序博客网 时间:2024/06/05 21:03

近期,有人将本人博客,复制下来,直接上传到百度文库等平台。
本文为原创博客,仅供技术学习使用。未经允许,禁止将其复制下来上传到百度文库等平台。如有转载请注明本文博客的地址(链接)
如需源码程序,请联系我。

有些网站抓包请求时,发现数据的真实地址,但在使用httpclient请求该真实地址时,却发现数据为空。该怎么办呢?以下以该网站为例进行讲解。

网站地址为:https://las.cnas.org.cn/LAS/publish/lab/keyBranchListView.jsp?baseInfoId=3ee5aa672cbf44d0a2d9906b2bae70c5

如下为数据截图:
这里写图片描述

通过抓包发现,该数据是通过json返回的,抓包获取了真实的请求地址。如下截图:
这里写图片描述

真实请求地址为:https://las.cnas.org.cn/LAS/publish/queryPublishKeyBranch.action?

单独请求该地址时,发现返回数据为空,如下截图:
这里写图片描述
数据如下:

{"pageCount":0,"remark":null,"addpost":null,"isModify":null,"mainActivityOther":null,"mainactivity":null,"labfeature":null,"remarkEn":null,"sizePerPage":0,"asstId":null,"addcode":null,"startIndex":0,"mainActivityOtherEn":null,"nameCn":null,"primaryRecommend":null,"branchId":null,"currPage":0,"statementPrefix":"getPageKeyBranch","totalSize":0,"labFeatureList":null,"keyNum":null,"data":[],"addEn":null,"addCn":null,"postCode":null,"labFeatureJson":null,"provider":[],"limit":0}

针对此问题,继续返回到抓包页面,发现还有一个表单传参,基于此分析,可设计如下程序:

package navi.main;import java.util.ArrayList;import java.util.List;import org.apache.http.NameValuePair;import org.apache.http.client.entity.UrlEncodedFormEntity;import org.apache.http.client.methods.HttpPost;import org.apache.http.impl.client.DefaultHttpClient;import org.apache.http.message.BasicHeader;import org.apache.http.message.BasicNameValuePair;import org.apache.http.util.EntityUtils;/** * @author:合肥工业大学 管理学院 钱洋 * @email:1563178220@qq.com * @  */public class Test {    public static void main(String[] args) throws Exception {        DefaultHttpClient client = new DefaultHttpClient();          String newUrl = "https://las.cnas.org.cn/LAS/publish/queryPublishKeyBranch.action?";          HttpPost post = new HttpPost(newUrl);          //设置参数,可有可无,并不是最关键的        post.addHeader(new BasicHeader("Cookie",                  "JSESSIONID=0000qty6OnqsYHgBdc3VKzr4zbI:1a5s8ura0"));          post.addHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");          post.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36");          post.addHeader("Host", "las.cnas.org.cn");          post.addHeader("Accept", "*/*");          post.addHeader("Accept-Language", "zh-CN,zh;q=0.8");          post.addHeader("X-Requested-With", "XMLHttpRequest");        post.addHeader("Referer", "https://las.cnas.org.cn/LAS/publish/lab/keyBranchListView.jsp?baseInfoId=3ee5aa672cbf44d0a2d9906b2bae70c5");        post.addHeader("Origin", "https://las.cnas.org.cn");        //表单传参数,关键的,必不可少        List<NameValuePair> list=new ArrayList<NameValuePair>();        list.add(new BasicNameValuePair("asstId", "3ee5aa672cbf44d0a2d9906b2bae70c5"));        post.setEntity(new UrlEncodedFormEntity(list));        org.apache.http.HttpResponse httpResponse = client.execute(post);          String responseString = EntityUtils.toString(httpResponse.getEntity());          System.out.println(responseString);     }}

如下,为程序返回的数据:

{"pageCount":1,"remark":null,"addpost":null,"isModify":null,"mainActivityOther":null,"mainactivity":null,"labfeature":null,"remarkEn":null,"sizePerPage":1,"asstId":"3ee5aa672cbf44d0a2d9906b2bae70c5","addcode":null,"startIndex":0,"mainActivityOtherEn":null,"nameCn":null,"primaryRecommend":null,"branchId":null,"currPage":1,"statementPrefix":"getPageKeyBranch","totalSize":1,"labFeatureList":null,"keyNum":null,"data":[{"remark":null,"addpost":null,"isModify":null,"keyNum":1,"labFeatureList":[{"baseInfoId":null,"branchId":null,"createBy":null,"createTs":null,"feature":"101001","id":"1974ed78b9a8409ba1ddd9dbc349098c","isModify":null,"labfeatureId":"1974ed78b9a8409ba1ddd9dbc349098c","other":null,"otherEn":null,"sourceId":null,"sqlUpdateType":null,"updateBy":null,"updateTs":null}],"mainactivity":"177001, 177003, 177004, 177005","mainActivityOther":null,"remarkEn":null,"labfeature":null,"addEn":"Bioassay and Safety Assessment Building, No.1500, Zhangheng Road, Zhangjiang Hi-Tech Park, Pudong New District, Shanghai, China","addCn":"上海市浦东新区张江高科技园区张衡路1500号生物与安全检测楼","postCode":"201203","addcode":null,"asstId":null,"mainActivityOtherEn":null,"labFeatureJson":"[{\"feature\":\"101001\"}]","nameCn":"上海市检测中心生物与安全检测实验室","primaryRecommend":null,"branchId":"3b00ef1f777247e1a2abd6e4b51ea1a8"}],"addEn":null,"addCn":null,"postCode":null,"labFeatureJson":null,"provider":[],"limit":0}
阅读全文
0 0
原创粉丝点击