Java基础:利用HttpClient获取网页内容
来源:互联网 发布:spark sql与hive 编辑:程序博客网 时间:2024/05/14 15:19
Java基础:利用HttpClient获取网页内容
发布时间:2006.04.24 05:18 来源:未知 作者:oneworld 转
HTTP协议是目前互联网上最重要的协议,许多软件与服务都需要依赖HTTP协议。
虽然java.net这个package中包含了对HTTP的基本支持,但还有很多高级和复杂的功能无法实现,这不能不说是一个遗憾。
HttpClient作为Apache的开源项目项目之一,为基于HTTP协议的操作提供了强大的客户端执行支持,最新的版本为3.0RC3。
下面通过一个例子简要展示HttpClient的使用方法:
--------------------------------------------------------------------------------
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
iimport java.io.UnsupportedEncodingException;
import java.util.*;
import org.apache.commons.httpclient.Header;
import org.apache.commons.httpclient.HostConfiguration;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpConnection;
import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager;
import org.apache.commons.httpclient.NameValuePair;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.httpclient.methods.PostMethod;
/**
* @author steven
*/
public class HttpClientExample {
//获得ConnectionManager,设置相关参数
private static MultiThreadedHttpConnectionManager manager =
new MultiThreadedHttpConnectionManager();
private static int connectionTimeOut = 20000;
private static int socketTimeOut = 10000;
private static int maxConnectionPerHost = 5;
private static int maxTotalConnections = 40;
//标志初始化是否完成的flag
private static boolean initialed = false;
//初始化ConnectionManger的方法
public static void SetPara() {
manager.getParams().setConnectionTimeout(connectionTimeOut);
manager.getParams().setSoTimeout(socketTimeOut);
manager.getParams()
.setDefaultMaxConnectionsPerHost(maxConnectionPerHost);
manager.getParams().setMaxTotalConnections(maxTotalConnections);
initialed = true;
}
//通过get方法获取网页内容
public static String getGetResponseWithHttpClient(String url, String encode) {
HttpClient client = new HttpClient(manager);
if (initialed) {
HttpClientExample.SetPara();
}
GetMethod get = new GetMethod(url);
get.setFollowRedirects(true);
String result = null;
StringBuffer resultBuffer = new StringBuffer();
try {
client.executeMethod(get);
//在目标页面情况未知的条件下,不推荐使用getResponseBodyAsString()方法
//String strGetResponseBody = post.getResponseBodyAsString();
BufferedReader in = new BufferedReader(
new InputStreamReader(
get.getResponseBodyAsStream(),
get.getResponseCharSet()));
String inputLine = null;
while ((inputLine = in.readLine()) != null) {
resultBuffer.append(inputLine);
resultBuffer.append("/n");
}
in.close();
result = resultBuffer.toString();
//iso-8859-1 is the default reading encode
result = HttpClientExample.ConverterStringCode(resultBuffer.toString(),
get.getResponseCharSet(),
encode);
} catch (Exception e) {
e.printStackTrace();
result = "";
} finally {
get.releaseConnection();
return result;
}
}
public static String getPostResponseWithHttpClient(String url,
String encode) {
HttpClient client = new HttpClient(manager);
if (initialed) {
HttpClientExample.SetPara();
}
PostMethod post = new PostMethod(url);
post.setFollowRedirects(false);
StringBuffer resultBuffer = new StringBuffer();
String result = null;
try {
client.executeMethod(post);
BufferedReader in = new BufferedReader(
new InputStreamReader(
post.getResponseBodyAsStream(),
post.getResponseCharSet()));
String inputLine = null;
while ((inputLine = in.readLine()) != null) {
resultBuffer.append(inputLine);
resultBuffer.append("/n");
}
in.close();
//iso-8859-1 is the default reading encode
result = HttpClientExample.ConverterStringCode(resultBuffer.toString(),
post.getResponseCharSet(),
encode);
} catch (Exception e) {
e.printStackTrace();
result = "";
} finally {
post.releaseConnection();
return result;
}
}
public static String getPostResponseWithHttpClient(String url,
String encode,
NameValuePair[] nameValuePair) {
HttpClient client = new HttpClient(manager);
if (initialed) {
HttpClientExample.SetPara();
}
PostMethod post = new PostMethod(url);
post.setRequestBody(nameValuePair);
post.setFollowRedirects(false);
String result = null;
StringBuffer resultBuffer = new StringBuffer();
try {
client.executeMethod(post);
BufferedReader in = new BufferedReader(
new InputStreamReader(
post.getResponseBodyAsStream(),
post.getResponseCharSet()));
String inputLine = null;
while ((inputLine = in.readLine()) != null) {
resultBuffer.append(inputLine);
resultBuffer.append("/n");
}
in.close();
//iso-8859-1 is the default reading encode
result = HttpClientExample.ConverterStringCode(resultBuffer.toString(),
post.getResponseCharSet(),
encode);
} catch (Exception e) {
e.printStackTrace();
result = "";
} finally {
post.releaseConnection();
return result;
}
}
private static String ConverterStringCode(String source, String srcEncode, String destEncode) {
if (src != null) {
try {
return new String(src.getBytes(srcEncode), destEncode);
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return "";
}
} else {
return "";
}
}
}
--------------------------------------------------------------------------------
之后,就可以通过下面的代码获得目标网页:
String source = HttpClientExample.getGetResponseWithHttpClient("www.sina.com.cn", "GBK");
注意,在默认情况下,HttpClient的Request的Head中
User-Agent的值是Jakarta Commons-HttpClient 3.0RC1,如果需要改变它(例如,变为Mozilla/4.0),必须在调用之前运行如下语句:
System.getProperties().setProperty("httpclient.useragent", "Mozilla/4.0");
- Java基础:利用HttpClient获取网页内容
- Java基础:利用HttpClient获取网页内容
- 利用HttpClient获取网页内容
- 利用java获取网页内容
- HttpClient 获取网页内容
- httpclient 获取网页内容
- 利用httpclient抓取网页内容
- 利用httpclient抓取网页内容
- Java使用HttpClient的HttpGet获取网页内容
- httpclient 用java调用 的方式获取网页内容,
- 利用URL和httpclient抓取网页内容
- HTTP 获取网页内容 HttpURLConnection与HttpClient
- HttpClient的get方法获取网页内容
- 利用MFC获取网页内容
- 利用socket获取网页内容
- Java中如何利用Selenium获取元素分析网页内容
- Java中如何利用Selenium获取元素分析网页内容
- java 获取网页内容
- fckeditor 编辑
- 被我们所忽略的鲁能的15个事情
- The Inventor Mentor翻译计划
- Intel双网卡绑定一个IP实现负载均衡
- 想进Google?不只聪明,还要具备执行力
- Java基础:利用HttpClient获取网页内容
- XML动态加载TreeView
- 使用一个PhaseListener来熟悉JSF 生命周期
- 在已建项目中使用AJAX.net
- web config详解
- 我的用户名and link
- GDI+编程10个基本技巧(很有用,就收藏了)
- 回调函数
- asp.net 访问Excel的方法