httpclient模拟浏览器

来源:互联网 发布:投资数据分析 编辑:程序博客网 时间:2024/06/16 12:53

用HttpClient模仿浏览器访问页面,要通过header设置进行,我们在前面“getContentLength()为-1之谜案”就用到了一个header方法。当我们打开一个浏览器访问网站的时候,浏览器会帮我们做很多的事情,比如告诉服务器浏览器的版本,浏览器当地所用的语言,这些httpClient都可以模拟。

一个简单例子:

package test.ffm83.commons.httpClient;

import org.apache.commons.lang.StringUtils;

import org.apache.http.HttpEntity;

import org.apache.http.client.methods.CloseableHttpResponse;

import org.apache.http.client.methods.HttpGet;

import org.apache.http.impl.client.CloseableHttpClient;

import org.apache.http.impl.client.HttpClients;

import org.apache.http.util.EntityUtils;

/**

 * httpClient 的简单应用

 * 基于4.x版本

 * @author范芳铭

 */

public classEasyHeaderGet {

    public final static void main(String[] args) throws Exception { 

        _getResonseHeader("http://www.ctrip.com/");

    } 

 

    private static void_getResonseHeader(String url)throws Exception{

        System.out.println(StringUtils.center(url+" getResonseHeader", 50,"-"));

        CloseableHttpClienthttpclient = HttpClients.createDefault();

        try

            HttpGet httpget = new HttpGet(url);

            httpget.addHeader("Accept","text/html"); 

            httpget.addHeader("Accept-Charset","utf-8"); 

           

            httpget.addHeader("Accept-Language","en-US,en"); //假装自己是英文区域

            //httpget.addHeader("Accept-Language", "zh-cn,zn");  //说明自己来自中文区域

            httpget.setHeader("Accept-Encoding","identity");//不压缩传输

            //httpget.addHeader("Accept-Encoding", "gzip");  //压缩传输

            httpget.addHeader("User-Agent","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.22(KHTML, like Gecko) Chrome/25.0.1364.160 Safari/537.22"); 

           

            CloseableHttpResponse response =httpclient.execute(httpget);

            HttpEntity entity =response.getEntity(); 

            System.out.println(response.getStatusLine()); 

            String webValue = EntityUtils.toString(entity);

            webValue = webValue.substring(0,webValue.length()/200);    

            System.out.println(webValue);//打印网页前面一小部分内容  

            httpget.abort();  

        }

        finally {       

        httpclient.close();

        }   

    }

}

运行结果如下:

------http://www.ctrip.com/getResonseHeader------

HTTP/1.1 200 OK

<!DOCTYPEhtml><html><head><meta http-equiv="X-UA-Compatible"content="IE=edge"/>

            <metahttp-equiv="Content-Type" content="text/html;charset=gb2312" />

           <meta name="description" content="携程旅行网是中国领先的在线旅行服务公司,向超过9000万会员提供酒店预订、酒店点评及特价酒店查询、机票预订、飞机票查询、时刻表、票价查询、航班查询、度假预订、商旅管理、为您的出行提供全方位旅行服务。" />

 

我们做一点调整,把下面这行代码从

httpget.addHeader("Accept-Language","zh-cn,zn");  //说明自己来自中文区域

修改为:

httpget.addHeader("Accept-Language","en-US,en");  //假装自己是英文区域

 

运行结果显示的就是英文的内容。

0 0