关于URL接口中数据的获取问题

来源：互联网发布：网络系统集成工程师编辑：程序博客网时间：2024/04/30 17:32

从URL中获取数据

Java中发起的POST请求，接收URL中的数据。

拿到的接口不能直接访问，会被拦截。

需要在header中加入发布接口时，提供的串码（key-value的形式），即可获取到数据。

HTTPClient的使用方式：

使用HttpClient发送请求、接收响应很简单，一般需要如下几步即可。

1. 创建HttpClient对象。

2. 创建请求方法的实例，并指定请求URL。如果需要发送GET请求，创建HttpGet对象；如果需要发送POST请求，创建HttpPost对象。

3. 如果需要发送请求参数，可调用HttpGet、HttpPost共同的setParams(HetpParams params)方法来添加请求参数；对于HttpPost对象而言，也可调用setEntity(HttpEntity entity)方法来设置请求参数。

4. 调用HttpClient对象的execute(HttpUriRequest request)发送请求，该方法返回一个HttpResponse。

5. 调用HttpResponse的getAllHeaders()、getHeaders(String name)等方法可获取服务器的响应头；调用HttpResponse的getEntity()方法可获取HttpEntity对象，该对象包装了服务器的响应内容。程序可通过该对象获取服务器的响应内容。

6. 释放连接。无论执行方法是否成功，都必须释放连接

参考链接：http://blog.csdn.net/sunhuaqiang1/article/details/51751581

使用到了Apache提供的commons-httpclient jar包，在pom中的依赖：

<groupId>commons-httpclient</groupId>

<artifactId>commons-httpclient</artifactId>

</dependency>

提供一个查询各种jar包依赖关系的网址：http://mvnrepository.com

示例代码：

public String transRequest(String url, String type, String message) {

// 响应内容

String result = "";

// 定义http客户端对象--httpClient

HttpClient httpClient = new HttpClient();

// 定义并实例化客户端链接对象-postMethod

PostMethod postMethod = new PostMethod(url);

try{

// 设置http的头

postMethod.setRequestHeader("ContentType",

"application/x-www-form-urlencoded;charset=UTF-8");

// 填入各个表单域的值

NameValuePair[] data = { new NameValuePair("type", type),

new NameValuePair("message", message) };

// 将表单的值放入postMethod中

postMethod.setRequestBody(data);

// 定义访问地址的链接状态

int statusCode = 0;

try {

// 客户端请求url数据

statusCode = httpClient.executeMethod(postMethod);

} catch (Exception e) {

e.printStackTrace();

}

// 请求成功状态-200

if (statusCode == HttpStatus.SC_OK) {

try {

result = postMethod.getResponseBodyAsString();

} catch (IOException e) {

e.printStackTrace();

}

} else {

log.error("请求返回状态：" + statusCode);

}

} catch (Exception e) {

log.error(e.getMessage(), e);

} finally {

// 释放链接

postMethod.releaseConnection();

httpClient.getHttpConnectionManager().closeIdleConnections(0);

}

return result;

}

使用此种方法可以得到POST中的数据信息。

但是在实际开发中，需要在header中加串码，保证数据的安全性，如果不加串码，访问会被拦截。

实现代码如下：

public String getURLInfo(){

String result = "";

SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd");

String nowTime = sdf.format(new Date());

String url = "http://10.161.**.**:**/ checkDate=" + nowTime;

//定义http客户端对象，定义并实例化客户端连接对象。

HttpClient httpClient = new HttpClient();

PostMethod postMethod = new PostMethod(url);

try {

postMethod.setRequestHeader(headerName , headerValue);

int statusCode = 0;

try {

//客户端请求url中的数据。返回请求结果的状态码。

statusCode = httpClient.executeMethod(postMethod);

}catch (Exception e){

e.printStackTrace();

}

//如果状态码=200。表示请求成功。

if(statusCode == HttpStatus.SC_OK){

try {

result = postMethod.getResponseBodyAsString();

}catch (Exception e){

e.printStackTrace();

}

}else{

System.out.println("请求有误，错误代码："+ statusCode);

}

}catch (Exception e){

System.out.println(e.getMessage());

}finally {

//释放链接。

postMethod.releaseConnection();

httpClient.getHttpConnectionManager().closeIdleConnections(0);

}

return result;

}

得到url中数据的字符串形式。一般是一个JSON格式的字符串。之后需要对字符串做什么处理，截取或者强转都可以。

org.json和net.sf.json的区别

net.sf.json.JSONObject 和org.json.JSONObject 的差别。

一、创建json对象

String str = "{\"code\":\"0000\", \"msg\":{\"availableBalance\":31503079.02}}

org.json.JSONObject：

JSONObject json = new JSONObject(str);

net.sf.json.JSONObject：

JSONObject json = JSONObject.fromObject(str);

net.sf.json.jsonobject 没有 new JSONObject(String)的构造方法

二、解析json

第一种直接用json对象.getXXX()；方法获取

net.sf.json.JSONObject：没有严格要求获取字段的类型跟getXXX()的类型一样

org.json.JSONObject：获取的字段类型必须跟getXXX()的类型一样

e.g.

JSONObject msgObj = json.getJSONObject("msg");

String availableBalance = msgObj.getString("availableBalance");

如果在org.json.JSONObject 就会报错，可以msgObj.getDouble("availableBalance");也不会丢精度；而net.sf.json.JSONObject正确，但是精度会丢失，如果String str = "{\"code\":\"0000\", \"msg\":{\"availableBalance\":\"31503079.02\"}}";

就不会丢失精度。

第二中json对象直接转变实体对象

public class BalanceDto {

private String availableBalance;

public String getAvailableBalance() {

return availableBalance;

}

public void setAvailableBalance(String availableBalance) {

this.availableBalance = availableBalance;

}

public String toString(){

return "availableBalance "+availableBalance;

}

org.json.JSONObject：

BalanceDto alanceDto = (BalanceDto) JSONObject.stringToValue(msgObj);

这个句话编译通过，但是运行会报错，原因是BalanceDto 类中availableBalance 的类型跟json中的“availableBalance ”类型不同意

net.sf.json.JSONObject：

String msg = json.getString("msg");

BalanceDto alanceDto = (BalanceDto) JSONObject.toBean(

msg, new BalanceDto().getClass());

三、从json中获取数组

JSONArray subArray = json.getJSONArray("msg");

net.sf.json.JSONObject:

int leng = subArray.size();

org.json.JSONObject：

int leng = subArray.length();

HTTP之常见状态码

1xx：指示信息--表示请求已接收，继续处理

2xx：成功--表示请求已被成功接收、理解、接受

3xx：重定向--要完成请求必须进行更进一步的操作

4xx：客户端错误--请求有语法错误或请求无法实现

5xx：服务器端错误--服务器未能实现合法的请求

200 OK //客户端请求成功

400 Bad Request //客户端请求有语法错误，不能被服务器所理解

401 Unauthorized //请求未经授权，这个状态代码必须和WWW-Authenticate报头域一起使用

403 Forbidden //服务器收到请求，但是拒绝提供服务

404 Not Found //请求资源不存在，eg：输入了错误的URL

500 Internal Server Error //服务器发生不可预期的错误

503 Server Unavailable //服务器当前不能处理客户端的请求，一段时间后可能恢复正常

更多状态码及含义：http://www.runoob.com/http/http-status-codes.html

下载某URL中的图片

此种方法没有定位某些图片的功能，但是可以下载到图片。

package com.fly.test;

import org.jsoup.Jsoup;

import org.jsoup.nodes.Document;

import org.jsoup.nodes.Element;

import org.jsoup.select.Elements;

import org.junit.Test;

import java.io.*;

import java.net.URL;

import java.net.URLConnection;

import java.nio.charset.Charset;

/**

* @Description :

* @Create by FLY on 2017-11-02 14:56

public class DemoDownloadPicture {

String ALL_URL_STR = "";

String ALL_SRC_STR = "";

int nonameId = 1;

int record = 0;

int noPicname = 0;

@Test

public void start(){

//要爬取的网站地址

String urlStr = "http://cpu.baidu.com/wap/1022/1329713/detail/4970306611472452/news?blockId=2998&foward=block";

String html = getHTML(urlStr);

getURL(html,0,"E://crawler//pic"); //图片存放地址，若无需创建

}

public String getHTML(String urlStr){

StringBuilder html = new StringBuilder();

BufferedReader buffer = null;

try {

URL url = new URL(urlStr);

URLConnection conn = url.openConnection();

conn.connect();

buffer = new BufferedReader(new InputStreamReader(conn.getInputStream(), Charset.forName("UTF-8")));

String line = null;

while((line = buffer.readLine()) != null){

html.append(line);

}

}catch (Exception e){

e.printStackTrace();

}finally {

if(buffer != null){

try {

buffer.close();

}catch (Exception e){

throw new RuntimeException("关闭流错误");

}

return html.toString();

}

public void getURL(String html, int tmp,String fileName){

if(tmp > 5 || html == null || html.length() == 0){

System.out.println("--------end-------");

return;

}

if(record > 1000){

System.out.println("--------图片大于1000张-----");

return;

}

System.out.println("------start----------");

String urlMain = "http://cpu.baidu.com/wap/1022/1329713/detail/4970306611472452/news?blockId=2998&foward=block";

String urlPicMain = "http:";

//解析网页内容

Document doc = Jsoup.parse(html);

//获取图片的链接，并下载图片。

Elements imglinks = doc.select("img[src]");

int picnum = 0;

String dirFileName = "";

for(Element imglink : imglinks){

String src = imglink.attr("src");

if(src == null ||"".equals(src) || src.length() < 3){

continue;

}

if(!ALL_SRC_STR.contains(src)){

ALL_SRC_STR += src + " ## ";

if(!src.contains(urlPicMain)){

src = urlPicMain + src;

}

if(picnum == 0){

//创建新目录

dirFileName = makedir(fileName);

picnum ++ ;

}

record ++;

downloadPicture(src , dirFileName);

}

Elements links = doc.select("a");

for(Element link : links){

String href = link.attr("href");

String text = link.text();

if(href == null || "".equals(href) || href.length() > 3){

continue;

}

if(text == null || "".equals(text)){

text = "noName" + nonameId ++;

}

if(!href.contains(urlMain)){

href = urlMain + href;

}

//distinct

if(!ALL_URL_STR.contains(href)){

ALL_URL_STR += href + " ## ";

System.out.println("***********");

System.out.println("获取到新的url地址"+text+"--->"+href);

getURL(getHTML(href) , tmp ++ ,text);

}

return;

}

public void downloadPicture(String src,String fileName){

InputStream is = null;

OutputStream os = null;

try {

String imageName = src.substring(src.lastIndexOf("/")+1,src.length());

int index = src.lastIndexOf(".");

String imgType = ".png";

System.out.println(index);

if(index != 1){

imgType = src.substring(index+1,src.length());

if(imgType.length() > 5){

imgType = ".png";

}

if(imageName == null || imageName.length() == 0){

imageName = ""+ noPicname++ ;

}

imageName += imgType;

//连接URL

URL url = new URL(src);

URLConnection uri = url.openConnection();

is = uri.getInputStream();

os = new FileOutputStream(new File(fileName,imageName));

byte[] buf = new byte[1024];

int length = 0;

while((length = is.read(buf, 0,buf.length)) != -1){

os.write(buf,0,length);

}

os.close();

is.close();

System.out.println(src + "下载成功=====");

}catch (Exception e){

System.out.println(src + "下载失败=====");

}finally {

try {

if(os != null){

os.close();

}

if(is != null){

is.close();

}

}catch (IOException e){

System.out.println("关闭流时发生异常");

}

public String makedir(String filesName){

//定义文件夹路径

String fileParh = "E://crawler//pic//"+filesName;

File file = new File(fileParh);

if(!file.exists()&&!file.isDirectory()){

file.mkdirs();//创建文件夹

if(file.exists()&&file.isDirectory()){

System.out.println("文件夹创建成功");

return fileParh;

}else{

System.out.println("文件夹创建不成功");

return "E://crawler//pic";

}

else{

System.out.println(filesName + "文件已经存在");

return fileParh;

}

只获取需要的图片

获取URL中指定的图片，获取源代码中图片的URL。组成json形式，在解析时，导入json文件。未完待续···

阅读全文

0 0