网络爬虫抓包使用及通过表单请求
来源:互联网 发布:日本对外贸易数据2015 编辑:程序博客网 时间:2024/06/05 21:03
近期,有人将本人博客,复制下来,直接上传到百度文库等平台。
本文为原创博客,仅供技术学习使用。未经允许,禁止将其复制下来上传到百度文库等平台。如有转载请注明本文博客的地址(链接)
如需源码程序,请联系我。
有些网站抓包请求时,发现数据的真实地址,但在使用httpclient请求该真实地址时,却发现数据为空。该怎么办呢?以下以该网站为例进行讲解。
网站地址为:https://las.cnas.org.cn/LAS/publish/lab/keyBranchListView.jsp?baseInfoId=3ee5aa672cbf44d0a2d9906b2bae70c5
如下为数据截图:
通过抓包发现,该数据是通过json返回的,抓包获取了真实的请求地址。如下截图:
真实请求地址为:https://las.cnas.org.cn/LAS/publish/queryPublishKeyBranch.action?
单独请求该地址时,发现返回数据为空,如下截图:
数据如下:
{"pageCount":0,"remark":null,"addpost":null,"isModify":null,"mainActivityOther":null,"mainactivity":null,"labfeature":null,"remarkEn":null,"sizePerPage":0,"asstId":null,"addcode":null,"startIndex":0,"mainActivityOtherEn":null,"nameCn":null,"primaryRecommend":null,"branchId":null,"currPage":0,"statementPrefix":"getPageKeyBranch","totalSize":0,"labFeatureList":null,"keyNum":null,"data":[],"addEn":null,"addCn":null,"postCode":null,"labFeatureJson":null,"provider":[],"limit":0}
针对此问题,继续返回到抓包页面,发现还有一个表单传参,基于此分析,可设计如下程序:
package navi.main;import java.util.ArrayList;import java.util.List;import org.apache.http.NameValuePair;import org.apache.http.client.entity.UrlEncodedFormEntity;import org.apache.http.client.methods.HttpPost;import org.apache.http.impl.client.DefaultHttpClient;import org.apache.http.message.BasicHeader;import org.apache.http.message.BasicNameValuePair;import org.apache.http.util.EntityUtils;/** * @author:合肥工业大学 管理学院 钱洋 * @email:1563178220@qq.com * @ */public class Test { public static void main(String[] args) throws Exception { DefaultHttpClient client = new DefaultHttpClient(); String newUrl = "https://las.cnas.org.cn/LAS/publish/queryPublishKeyBranch.action?"; HttpPost post = new HttpPost(newUrl); //设置参数,可有可无,并不是最关键的 post.addHeader(new BasicHeader("Cookie", "JSESSIONID=0000qty6OnqsYHgBdc3VKzr4zbI:1a5s8ura0")); post.addHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"); post.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"); post.addHeader("Host", "las.cnas.org.cn"); post.addHeader("Accept", "*/*"); post.addHeader("Accept-Language", "zh-CN,zh;q=0.8"); post.addHeader("X-Requested-With", "XMLHttpRequest"); post.addHeader("Referer", "https://las.cnas.org.cn/LAS/publish/lab/keyBranchListView.jsp?baseInfoId=3ee5aa672cbf44d0a2d9906b2bae70c5"); post.addHeader("Origin", "https://las.cnas.org.cn"); //表单传参数,关键的,必不可少 List<NameValuePair> list=new ArrayList<NameValuePair>(); list.add(new BasicNameValuePair("asstId", "3ee5aa672cbf44d0a2d9906b2bae70c5")); post.setEntity(new UrlEncodedFormEntity(list)); org.apache.http.HttpResponse httpResponse = client.execute(post); String responseString = EntityUtils.toString(httpResponse.getEntity()); System.out.println(responseString); }}
如下,为程序返回的数据:
{"pageCount":1,"remark":null,"addpost":null,"isModify":null,"mainActivityOther":null,"mainactivity":null,"labfeature":null,"remarkEn":null,"sizePerPage":1,"asstId":"3ee5aa672cbf44d0a2d9906b2bae70c5","addcode":null,"startIndex":0,"mainActivityOtherEn":null,"nameCn":null,"primaryRecommend":null,"branchId":null,"currPage":1,"statementPrefix":"getPageKeyBranch","totalSize":1,"labFeatureList":null,"keyNum":null,"data":[{"remark":null,"addpost":null,"isModify":null,"keyNum":1,"labFeatureList":[{"baseInfoId":null,"branchId":null,"createBy":null,"createTs":null,"feature":"101001","id":"1974ed78b9a8409ba1ddd9dbc349098c","isModify":null,"labfeatureId":"1974ed78b9a8409ba1ddd9dbc349098c","other":null,"otherEn":null,"sourceId":null,"sqlUpdateType":null,"updateBy":null,"updateTs":null}],"mainactivity":"177001, 177003, 177004, 177005","mainActivityOther":null,"remarkEn":null,"labfeature":null,"addEn":"Bioassay and Safety Assessment Building, No.1500, Zhangheng Road, Zhangjiang Hi-Tech Park, Pudong New District, Shanghai, China","addCn":"上海市浦东新区张江高科技园区张衡路1500号生物与安全检测楼","postCode":"201203","addcode":null,"asstId":null,"mainActivityOtherEn":null,"labFeatureJson":"[{\"feature\":\"101001\"}]","nameCn":"上海市检测中心生物与安全检测实验室","primaryRecommend":null,"branchId":"3b00ef1f777247e1a2abd6e4b51ea1a8"}],"addEn":null,"addCn":null,"postCode":null,"labFeatureJson":null,"provider":[],"limit":0}
阅读全文
0 0
- 网络爬虫抓包使用及通过表单请求
- 使用httpclient 抓包 提交表单 网络请求
- 网络爬虫抓包的使用
- 使用Charles进行网络请求抓包解析
- 使用Fiddler针对Android手机网络请求抓包
- Java网络爬虫(十三)--使用tcpdump和wireshark进行网络抓包与分析
- 使用Charles抓https请求包
- android使用Charles抓包https请求
- 使用fiddler抓包手机HTTP请求
- 使用tshark抓包分析http请求
- android使用Charles抓包https请求
- 使用WireShark抓包分析Android网络请求时间(一)
- 使用WireShark抓包分析Android网络请求时间(二)
- 使用Fiddler针对Android手机网络请求抓包和过滤方法
- 爬虫数据抓包
- android网络抓包及Wireshark介绍
- 数据抓包(网络爬虫)-正方教务管理系统登录
- java网络爬虫学习记录(二)抓包分析
- overlay 如何实现跨主机通信?- 每天5分钟玩转 Docker 容器技术(52)
- Android Translucent System Bar 开发详解-实现沉浸式通知栏(通知栏与导航栏颜色相同)
- Android中轮播所用到viewPager.PageTransformer 页面滑动时候处理图片缩放效果代码
- 判断一个树是不是平衡二叉树
- Android SDK 之 InputMethodService 详解
- 网络爬虫抓包使用及通过表单请求
- 时间复杂度和空间复杂度
- Spring Ioc的基本原理及XML实现方法(下)
- [LeetCode] 583. Delete Operation for Two Strings
- 顺序表的简单实现
- Eclipse常用快捷键
- html5调用摄像头使用Getusermedia和canvas
- leetcode--Convert Sorted Array to Binary Search Tree
- win7环境下离线安装tensorflow