Java CookBook Learning Day1st--HTTP (by Tim O'Brien)

来源:互联网 发布:qq加速升级软件 编辑:程序博客网 时间:2024/05/21 19:39

 1. Introduction

  The Hypertext Transfer Protocol(HTTP) is ubiquitous; this protocol is at the core of important technologies such as the World Wide Web(WWW),the Simple Object Access Protocol(SOAP),XML databases, content management systems, WebDAV, and, most importantly, iTunes. Much of the world;s business is accomplished over HTTP in some form or another, and if this protocol were to suddenly vanish, a sizable portion of the world economy would vanish with it. Give this ubiquity, it is likely that you've had occasion to write a Java program that needed to fetch a web page or interact with a servlet or CGI script. While the J2SE contains some rudimentary tools to communicate via HTTP, the feature set of URLConnection is somewhat limited. This chapter introduces Apache HttpClient, a set of utilities that simplifies the task of retrieving and sending information over HTTP.

  Apache HttpClient grew up in the Jakarta Commons, and until April 2004 it was a Commons project. It subsequently graduated to a full Jakarta project, and as Jakarta was dismantled, it became an Apache top-level project. This chapter refers to HttpClient as Apache HttpClient, but you should be aware that most of the documentation and support still refer to the project as Commons HttpClient.

  博主注:

     意思就是:Http协议很普遍很重要,你的程序要和服务器通信都要通过HTTP协议,J2SE虽然包括了很多基本工具可以让你的程序通过HTTP协议与服务器通信,但很有限,所以我们要额外学习 Apache HttpClient ,使得我们的实现更简单。

  1.1. Obtaining HttpClient

  1.1.1. Problem

  You need to use Apache HttpClient to write a program to interact with a server over HTTP.

  1.1.2 .Solution

  To use Commons HttpClient 3.1 in a Maven 2 project, add the following dependencey to your project's pom.xml:

  Example 1.1 Adding a dependency on Commons HttpClient

<dependency><groupId>commons-httpclient</groupId><artifactId>commons-httpclient</artifactId><version>3.1</version></dependency>

  When you depend on a library in Maven 2, all you need to do is to add the dependency groupId,artifactId, and version to your project's dependencies. Once you do this, Maven 2 will download the dependency and make it available on your project's classpath.

  博主注:在项目的pom.xml中加入Apache HttpClient 依赖包的信息,Maven就会自动下载并将其放在你的项目路径下了。

  1.1.3. Discussion

  Apache HttpClient is a client library for the HTTP protocol. HttpClient is very feature-rich, supporting all HTTP methods defined in RFC 2616(Hypertext Transfer Protocol, HTTP/1.1). Apache HttpClient supports GET,POST,PUT,OPTIONS,HEAD,and DELETE using the command pattern; HttpMethod objects are instantiated and executed by an HttpClient object that manages server interaction and state accross multiple requests. HttpClient has support for various authentication mechanisms, including Basic, Digest, and NTLM authentication. HttpClient supports both HTTP and HTTPS, and the library can manage the complexity involved in using both schemes with an HTTP proxy. HttpClient can manage the state of a session, automatically keep track of cookies across multiple requests.

  博主注:Apache HttpClient很强大,支持GET,POST,PUT,OPTIONS,HEAD, DELETE的请求方式(其实常用的就前两种啦),一般先实例化HttpClient对象,再实例化 HttpMethod对象,然后使用HttpClient对象执行 HttpMethod,来实现与服务器的交流了;HttpClient 对HTTP,HTTPS都支持的,使用HTTP代理,还可以同时执行HTTP,HTTPS的请求;HttpClient 还会管理会话(Session)的状态,自动记录多个请求间的cookies信息。

  1.2. Perfoming an HTTP GET

  1.2.1. Problem 

  You need to retrieve information with the HTTP GET method.

  博主注:使用HTTP的GET方法获取信息

  1.2.2 Solution

  Create an instance of HttpClient and use it to execute a GetMethod Object. Once the method has been executed, the reponse body can be accessed as an InputStream, byte[], or String. The following example gets the contents of of http://www.discursive.com/jccook/ and retrieves the reponse body as a string:

 博主注:执行GetMethod 对象后,返回的结果是输入流,比特数据或字符串。

import org.apache.commons httpclient.HttpClient;import org.apache.commons.httpclient.HttpException;import org.apache.commons.httpclient.HttpMethod;import org.apache.commons.httpclient.methods.GetMethod;HttpClient client = new HttpClient( );HttpMethod method = new GetMethod( url );// 博主注:新建该对象的时候,要把请求地址加入try{    client.executeMethod( method );    if( method.getStatusCode( ) == HttpStatus.SC_OK ){       String reponse= method.getResponseBodyAsString( );       System.out.println( reponse );       }}catch(HttpException he){  System.out.println("HTTP Problem: " + he.getMessage( ) );}catch(IOException ioe){  System.out.println("IO Exception: " + ioe.getMessage( ) );}finally{  method.releaseConnection( );  method.recycle( );}

  This code retrieves the content of http://www.discursive.com/jccook using the HTTP GET method. If the response code is HttpStatus.SC_OK or 200, the response is printed to the console.

<html><head><title>JCCook Example</title></head><body><h1>Hello World!</h1></body></html>

  1.2.3.Discussion

  Note the exception handling involved in this example. Performing a simple HTTP GET called for two catch blocks:
HttpException and IOException.An HttpException is thrown if there is a problem relating to the HTTP protocol, and
an IOException is thrown if there is a problem with the network.Examples in this chapter omit the rigorous exception
handling from the previous example; you can assume that every call to execute() is surrounded by the appropriate
try/catch block.GetMethod is an implementation of the HttpMethod interface,which is executed by HttpClient.The lifecycle of any HttpMethod implementation is straightforward; an HttpMethod is created, executed by an instance of HttpClient, and, once the response has been examined, the connection is released and the method is recycled. When an HttpMethod object is recycled by a call to recycle( ), it is a signal to the system that this specific HttpMethod instance can be used again. releaseConnection( ) instructs HttpClient to release the connection that is associated with an HttpMethod instance. No matter what happens during the execution of a method, the releaseConnection( ) must be called to free network resources.

  博主注:在执行HTTP GET是引发异常,如果是网络问题,就是IOException,HTTP协议相关问题,HTTPException,因此execute()方法要用try,catch包围以捕获异常。GETMethod实现了HTTPMethod接口,HttpMethod实例被HttpClient实例执行后,一旦返回的信息被确认,连接就会被释放 releaseConnection( ) ,HttpMethod就会被循环复用recycle( )

Once a method has been executed, you can get the response status code from method.getStatusCode( ). This method returns an int, which will correspond to one of the public static final variables on HttpStatus. Some of the more common status codes on HttpStatus are SC_OK(200), SC_NOT_FOUND(404), SC_INTERNAL_SERVER_ERROR(500), SC_MOVED_TEMPORARILY(302), and SC_UNAUTHORIZED(401). For a full list of HTTP status codes, see the Javadoc for HttpStatus. When a server sends back a bad HTTP status, it is sometimes accompanied by a short message. This message can be read by calling method.getStatusText( ).

   博主注:method.getStatusCode( )可以获取响应状态码,从而看出是否成功获取返回信息,200代表通讯成功,404找不到资源,500服务器运行错误,302资源被转移(重定向,新的URL会在response中的Location中返回,浏览器将会使用新的URL发出新的Request。),401未被授权等,method.getStatusText( )可以用于获取HTTP状态异常信息。规则:

      1XX  提示信息 - 表示请求已被成功接收,继续处理

  2XX  成功 - 表示请求已被成功接收,理解,接受

  3XX  重定向 - 要完成请求必须进行更进一步的处理

  4XX  客户端错误 -  请求有语法错误或请求无法实现

  5XX  服务器端错误 -   服务器未能实现合法的请求

  1.3. Sending Parameters in a Query String

  1.3.1. Problem

  You need to send query parameters in a URL.

  博主注:在URL中加入请求参数

  1.3.2 Solution

  Set the query string using the setQueryString( ) method on an instance of HttpMethod. Use URIUtil to encode any text included in a URL. The following example puts two parameters on the query string:

import org.apahe.commons.httpclient.HttpClient;import org.apache.commons.httpclient.HttpException;import org.apache.commons.httpclient.HttpMethod;import org.apache.commons.httpclient.NameValuePair;import org.apache.commons.httpclient.methods.GetMethod;import org.apache.commons.httpclient.util.URIUtil;HttpClient client =new HttpClient( );String url="http://www.discursive.com/cgi-bin/jccook/param_list.cgi";HttpMethod method = new GetMethod( url );//Set the Query String with setQueryString( )method.setQueryString(URIUtill.encodeQuery("test1=O Reilly&blah=Whoop"));System.out.println("With Query String:" + method.getURI( ) );client.executeMethod(method);System.out.println("Response:\n " + method.getResponseBodyAsString( ) );method.releaseConnection( );


  The param_list.cgi CGI script echoes all parameters received, and from the following output, you can see how URIUtil encodes the first parameter:

With Query String: http://www.discursive.com/cgi-bin/jccook/param_list.cgi?test1=O%20Reilly&blah=WhoopResponse:These are the parameters I received:test1:O Reillyblah:Whoop


  Tip:The question mark is understood, and you do not need to supply a leading question mark to the setQueryString( ) method of HttpMethod, as it will be automatically added when the method is executed by an instance of HttpClient.

   博主注:setQueryString( ) 方法会自动添加问号符号,无需我们手动添加。

  1.3.3. Discussion

  In the previous example, method.setQueryString() is used to set the entire query string at once, but there is another alternative: setting the query string with an array of NameValuePair objects. When a NameValuePair[ ] is passed to method.setQueryString( ), the HttpMethod then takes each pair and creates a series of parameters delimited by an ampersand. The approach encourages cleaner code because you are not concatenating strings to pass multiple parameters. The following example sets the same parameters used in the previous example, using NameValuePair objects:

  博主注:使用NameValuePair 对象可以一个一个地设置参数对,然后将NameValuePair [ ]矩阵传给setQueryString( ),setQueryString( )会自动用&符号分割参数对,此方法使程序更明了。

//Set Query string with name value pair objectsHttpMethod method = new GetMethod(url);NameValuePair pair = new NameValuePair( "test1", URIUtil.encodeQuery("O Reilly") );NameValuePair pair2 = new NameValuePair("blah", URIUtil.encodeQuery("Whoop”) );NameValuePair[] pairs = new NameValuePair[ ] { pair, pair2};method.setQueryString(pairs);System.out.println("With NameValuePairs: " + method.getURI( ) );client.executeMethod(method);System.out.println("Response:\n " + method.getReponseBodyAsString ( ) ) ;method.releaseConnection;

  According to RFC 1738 (Uniform Resource Locators (URL) specification) URLs can only contain alphanumeric
characters, [0-9a-zA-Z], and a few other special characters. If you need to send a parameter with an unacceptable
character in a URL, you will need to encode your string according to the standard defined in RFC 1738. URIUtil exposes a method encodeQuery( ) that can be used to encode the value "O Reilly" in the previous example. The following code demonstrates the use of URIUtil to encode strings for inclusion in a URL: 

   博主注:因为RFC 1738规定URL只能包含字母数字 [0-9a-zA-Z]与一小部分特殊字符,所以如果我们需要在参数中加入不符合规范的字符,就要用使用URIUtil.encodeQuery方法对该字符先进行编码。

String encoded1=URIUtil.encodeQuery("<test>=O' Connell");String encoded2=URIUtil.encodeQuery("one:two=thr ee#");String decoded= URIUtil.decode("Hello%20World%3F");System.out.println("Encoded: " +encoded1);System.out.println( "Encoded: " + encoded2 );System.out.println( "Decoded: " + decoded );  

  This simple example encodes two strings and decodes an encoded string using URIUtil. The output shows the result of each transformation:

Encoded: %3ctest%e3=O'ConnellEncoded: one%3atwo=thr%20ee#23Decoded: Hello World?


  1.4.Retrieving Content with a Conditional GET

  1.4.1. Problem

   You need to retrieve the same content more than once, and you would like to have the server only send the content if it has changed since the last request.

  博主注;如果你需要多次获取同一内容并要求服务器在最后一次请求后,内容发生改变的情况下,才再一次发送内容。

  1.4.2 Solution

  Create a GetMethod and set the If-None-Match and If-Modified-Since headers; these two headers will instruct the
server to refrain from sending content if the content has not been altered since the last request. Example 11-1 makes three separate requests for the same URL (http://www.apache.org), and, because the content remains static, it is only sent in the response body of the first request.

  博主注:

   If-Modified-Since

  作用: 把浏览器端缓存页面的最后修改时间发送到服务器去,服务器会把这个时间与服务器上实际文件的最后修改时间Last-Modified进行对比。如果时间一致,那么返回304,客户端就直接使用本地缓存文件。如果时间不一致,就会返回200和新的文件内容。客户端接到之后,会丢弃旧文件,把新文件缓存起来,并显示在浏览器中。

 If-None-Match

  作用: If-None-Match和ETag一起工作,工作原理是在服务器在HTTP Response中添加ETag信息。 当用户再次请求该资源时,将在HTTP Request 中加入If-None-Match信息(ETag的值)。如果服务器验证资源的ETag没有改变(该资源没有更新),将返回一个304状态告诉客户端使用本地缓存文件。否则将返回200状态和新的资源和Etag.  使用这样的机制将提高网站的性能.


  Example 11-1. Requesting information with a conditional GET

import java.io.IOException;import org.apache.commons.httpclient.Header;import org.apache.commons.httpclient.HederElement;import org.apache.commons.httpclient.HttpClient;import org.apache.commons.httpclient.HttpException;import org.apache.commons.httpclient.HttpMethod;import org.apache.commons.httpclient.HttpStatus;import org.apache.commons.httpclient.methods.GetMethod;public class ConditionalGetExample{  public static void main(String[ ] args) throws HttpException, IOException{   ConditionalGetExample example = new ConditionalGetExample( );   example.start( );  } String entityTag = ""; String lastModified = "";  public void start( ) throws HttpException, IOException{     HttpClient client = new HttpClient( );     HttpMethod method = new GetMethod ("http://www.apache.org");     for( int i = 0; i < 3; i++ ){        setHeaders(method);        client.executeMethod(method);        processResults(method);        method.releaseConnection( );        method.recycle( );     }  }  private void setHeaders(HttpMethod method){    method.setRequestHeader(new Header("If-None-Match", entityTag) );    method.setRequestHeader(new Header("If-Modified-Since", lastModified) );  }  private void processResults( HttpMethod method) throws HttpException{    if(method.getStatusCode( ) == HttpStatus.SC_NOT_MODIFIED) {      System.out.println("Content not modified since last request");    }else{      entityTag = retrieveHeader(method, "ETag");      lastModified= retrieveHeader(method,"Last-Modified");      System.out.println( "Get Method retrieved content." );      System.out.println( "Entity Tag: " + entityTag )      System.out.println( "Last Modified: " + lastModified );    }  }  private String retrieveHeader( HttpMethod method, String name ) throws HttpException {    HeaderElement[] header = method.getResponseHeader("ETag").getElements( );    String value = " ";   if(header.length>0){      value=hearder[0].getName( );   }   return value;  }}

  Example 11-1 requests the same page three times; the first request retrieves the content, and the second and third requests contain the headers for a conditional HTTP GET. Because the Apache homepage remains unchanged throughout this example, the content of http://www.apache.org is retrieved only once.This example uses the headers in the first response to populate the If-None-Match and If-Modified-Since headers for the second and third requests:

Request 1. Get Method retrieved content.Entity Tag: "2da794a-2d0d-998ebc80"Last Modified: Wed, 14 Apr 2004 05:53:38 GMTRequest 2. Content not modified since last requestRequest 3. Content not modified since last request

  1.4.3.Discussion

  This first request is similar to a regular HTTP GET, and if you examine the contents of the first request and response, you will see that the server's response contains two headers:

Last-Modified: Wed, 05 May 2004 02:51:59 GMTETag: "a06d2-76-829811c0"

  Example 11-1 takes the values of these two response headers and stores them in the entityTag and lastModified
variables. When the next request is made, the values of these two variables are used to populate the conditional headers If-None-Match and If-Modified-Since in the setHeaders() method. These request headers are present in the second request for the same resource:

GET / HTTP/1.1If-None-Match: "2da7807-31a8-e1eeb400"If-Modified-Since: Tue, 11 May 2004 23:57:04 GMTUser-Agent: Jakarta Commons-HttpClient/3.0finalHost: www.apache.org

  When the server receives these conditional headers, it will check to see if the resource has been modified. If the resource has been modified, the resource will have a different modified date and different entity tag value. If the resource has not been modified, the server will respond to HttpClient with a 304 Not Modified response code:

HTTP/1.1 304 Not ModifiedDate: Sat, 15 May 2004 16:59:23 GMTServer: Apache/2.0.49-dev (Unix)ETag: "2da7807-31a8-e1eeb400"Expires: Sun, 16 May 2004 16:59:23 GMTCache-Control: max-age=86400

  The ETag header is known as an entity tag, and it is similar to a hash code for a resource served by a particular web server. Different servers have different algorithms for creating an ETag header; for example, the Apache HTTPD server has a configuration directive, which allows you to base an ETag on a configurable set of file attributes, such as size, i-node number, and modified time. Consider an ETag to be a unique identifier for a particular version of a resource; if this header changes, the corresponding resource has been altered.

1 0
原创粉丝点击