Java网络编程——第七章 URLConnection

来源：互联网发布：道亨软件下载编辑：程序博客网时间：2024/06/06 20:08

URLConnection类

这是一个抽象类，与URL类相比，主要有以下不同点

1、对与服务器的通信提供了更多的控制，如检查、设置首部，使用put post等http方法

2、URLConnection是Java协议处理器（protocol hanler）的一部分，该部分还包括URLStreamHandler，将协议的细节和数据处理分开

3、URLConnection还可以向服务器写入数据

URLConnection的构造方法是protected的，即只能由URLConnection和他的子类访问，除非派生URLConnection处理新的URL类型，否则使用URL获取；在运行时，一般由使用的协议创建所需的对象，然后使用 java.lang.Class的forName()和newInstance()实例化URLConnection对象；抽象类URLConnection的connect()方法没有实现，因子子类必须实现该方法，完成与服务器的连接（该方法依赖于服务类型）；

打开URLConnection的一般步骤

1、构造URL对象

2、调用该对象的openConnection()，获取URLConnection对象

3、配置URLConnection（可选）

4、读取首部（可选）

5、获得输入流并读取数据

6、获得输出流并写入数据

7、关闭连接

Note：第一次构建URLConnection时，实际的连接没有打开，即没有socket连接，需要使用connect()建立连接，而一般而言，connect()是隐式调用的，需要打开连接的方法会在连接未建立是使用connect()建立连接，如getInputStream()，getContent()等

从服务器读取数据

1、构建URL对象

2、获得URLConnection对象

3、调用URLConnection的getInputStream()方法

4、使用常规API读取数据

import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

public class SourceViewer2 {
     public static void main(String[] args) {
           String url = "https://www.baidu.com";
           try {
                //构造URL
                URL u = new URL(url);
                // 获取URLConnection对象
                URLConnection connection = u.openConnection();
                // 打开输入流
                try (InputStream raw = connection.getInputStream()) {// 自动关闭
                     // 使用常规API读取数据
                     Reader reader = new InputStreamReader(raw);
                     int c;
                     while((c = reader.read()) != -1) {
                           System.out.print((char) c);
                     }
                }
           } catch (MalformedURLException e) {
                System.out.println(e);
           } catch (IOException e) {
                System.out.println(e);
           }
     }
}

获取指定的首部字段

getContenType，获取MIME的完整内容类型，没有类型则返回null，

import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.UnsupportedEncodingException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

public class EncodeingAwareSourceViewer {
     public static void main(String[] args) {
           String url = "https://www.baidu.com";

           try {
                String defaultEncoding = "ISO-8859-1";
                URL u = new URL(url);
                URLConnection connection = u.openConnection();
                // 获取响应的MIME类型
                String encodingType = connection.getContentType();
                int  encodingStart = encodingType.indexOf("charset=");
                if(encodingStart != -1) {
                     defaultEncoding = encodingType.substring(encodingStart+8);
                     System.out.println(defaultEncoding);
                }
                InputStream in = new BufferedInputStream(connection.getInputStream());
                Reader reader = new InputStreamReader(in, defaultEncoding);
                int c;
                while((c = reader.read()) != -1) {
                     System.out.print((char) c);
                }
                reader.close();
           } catch (MalformedURLException e) {
                System.out.println(e);
           } catch (UnsupportedEncodingException e) {
                System.out.println(e);
           } catch (IOException e) {
                System.out.println(e);
           }
     }

}

getContentLength，获取内容的大小，没有则返回-1；注意，getContenLength有可能因为内容过大超出int的范围而返回-1，这时可以使用Java 7 的getContentLengthLong，返回long型数据；该方法可以用来控制在接收多少数据后关闭连接

import java.io.BufferedInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

public class BinarySaver {
     public static void main(String[] args) throws IOException {

           String url = "http://img3.kwcdn.kuwo.cn/star/KuwoArtPic/2013/7/1396927592034_w.jpg";

           try {
                URL u = new URL(url);
                saveBinaryFile(u);
           } catch (MalformedURLException e) {
                System.out.println(e);
           } catch (IOException e) {
                System.out.println(e);
           }

     }

     public static void saveBinaryFile(URL url) throws IOException {
           URLConnection connection = url.openConnection();
           String contentType = connection.getContentType();
           int contentLength = connection.getContentLength();
           // 判断文件类型，确保得到一个长度已知的二进制文件
           if (contentType.startsWith("text/") || contentLength == -1) {
                System.out.println(contentType);
                System.out.println(contentLength);
                throw new IOException("this is not a binary file");
           }

           try (InputStream raw = connection.getInputStream()) {
                InputStream in = new BufferedInputStream(raw);
                byte[] data = new byte[contentLength];
                int offset = 0;
                while (offset < contentLength) {
                     int byteRead = in.read(data, offset, data.length - offset);
                     if (byteRead == -1) {
                           break;
                     }
                     offset += byteRead;
                }
                if (offset != contentLength) {
                     throw new IOException("Only read" + offset + " bytes, expect " + contentLength + " bytes");
                }
                String fileName = url.getFile();
                fileName = fileName.substring(fileName.lastIndexOf("/") + 1);
                try (FileOutputStream fout = new FileOutputStream(fileName)) {
                     fout.write(data);
                     fout.flush();
                }
                System.out.println("Done");
           }
     }
}

getContentEncoding，返回内容的编码方式字符串，不存在则返回null，

getDtae，返回文档发送时的服务器端时间，不存在Date字段则返回0；

getExpiration，返回文档过期时间，文档过期后应从缓存中删除并从服务器重新获取，不存在返回0；

getLastModified，返回文档最后修改日期，不存在返回0

Note：以上时间均为GMT时间

http首部的几个关于语言编码的字段

Accept-Charset 浏览器申明自己可以接收的字符集

Accept-Encoding 浏览器申请自己可以接收的编码方法，通常指的是压缩方法，是否支持和支持哪种

Accept-Language 浏览器申明自己可以接收的语言，一种语言，如中文，有很多种字符集，如GB2312、GBK、BIG5等

Content-Type WEB服务器告诉浏览器响应的对象的类型和字符集

Content-Encoding WEB服务器告诉浏览器响应的对象使用的压缩方法；

Content-Language WEB服务器告诉浏览器自己响应的对象的语言

import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.util.Date;

public class HreaderViewer {
     public static void main(String[] args) {
           String url = "https://www.baidu.com";

           try {
                URL u = new URL(url);
                URLConnection connection  = u.openConnection();
                System.out.println("Content-Type " + connection.getContentType());
                if (connection.getContentEncoding() != null) {
                     System.out.println("Content-Encoding " + connection.getContentEncoding());
                }
                if (connection.getDate() != 0) {
                     System.out.println("Date " + new Date(connection.getDate()));
                }
                if (connection.getLastModified() != 0) {
                     System.out.println("Last-Modified " + new Date(connection.getLastModified()));
                }
                if (connection.getExpiration() != 0) {
                     System.out.println("Expriation " + new Date(connection.getExpiration()));
                }
                if (connection.getContentLength() != -1) {
                     System.out.println("Content-Length " + connection.getContentLength());
                }
           } catch (MalformedURLException e) {
                System.out.println(e);
           } catch (IOException e) {
                System.out.println(e);
           }
     }
}

获取首部所有字段

主要使用getHeaderField获取值和getHeaderFieldKey获取键，注意getHeaderFiled的值不一定都有效，必须检查是否为null

import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

public class AllHeaders {
     public static void main(String[] args) {
           String url = "https://www.baidu.com";

           try {
                URL u = new URL(url);
                URLConnection connection = u.openConnection();
                for (int i = 1; ; i++) {
                     String header = connection.getHeaderField(i);
                     // getHeaderField的值不一定都有有效，必须检查是否为null
                     if (header == null) {
                           break;
                     }
                     System.out.println(connection.getHeaderFieldKey(i) + " : " + header);
                }
           } catch (MalformedURLException e) {
                System.out.println(e);
           } catch (IOException e) {
                System.out.println(e);
           }
     }
}

缓存

http://www.cnblogs.com/_franky/archive/2011/11/23/2260109.html

常用cache-directive 表

public所有内容都将被缓存(客户端和代理服务器都可缓存)private内容只缓存到私有缓存中(仅客户端可以缓存，代理服务器不可缓存)no-cache必须先与服务器确认返回的响应是否被更改，然后才能使用该响应来满足后续对同一个网址的请求。因此，如果存在合适的验证令牌 (ETag)，no-cache 会发起往返通信来验证缓存的响应，如果资源未被更改，可以避免下载。no-store所有内容都不会被缓存到缓存或 Internet 临时文件中must-revalidation/proxy-revalidation如果缓存的内容失效，请求必须发送到服务器/代理以进行重新验证max-age=xxx (xxx is numeric)缓存的内容将在 xxx 秒后失效, 这个选项只在HTTP 1.1可用, 并如果和Last-Modified一起使用时, 优先级较高

max-age与s-maxage

max-age：请求:强制响应缓存者，根据该值,校验新鲜性.即与自身的Age值,与请求时间做比较.如果超出max-age值,则强制去服务器端验证.以确保返回一个新鲜的响应.其功能本质上与传统的Expires类似,但区别在于Expires是根据某个特定日期值做比较.一但缓存者自身的时间不准确.则结果可能就是错误的.而max-age,显然无此问题. Max-age的优先级也是高于Expires的

s-maxage：响应:与max-age的唯一区别是,s-maxage仅仅应用于共享缓存.而不引用于用户代理的本地缓存,等针对单用户的缓存. 另外,s-maxage的优先级要高于max-age.

对 cache-directive 值的浏览器响应

　　Cache-directive打开一个新的浏览器窗口在原窗口中单击 Enter 按钮刷新单击 Back 按钮public浏览器呈现来自缓存的页面浏览器呈现来自缓存的页面浏览器重新发送请求到服务器浏览器呈现来自缓存的页面private浏览器重新发送请求到服务器第一次，浏览器重新发送请求到服务器；此后，浏览器呈现来自缓存的页面浏览器重新发送请求到服务器浏览器呈现来自缓存的页面no-cache/no-store浏览器重新发送请求到服务器浏览器重新发送请求到服务器浏览器重新发送请求到服务器浏览器重新发送请求到服务器must-revalidation/proxy-revalidation浏览器重新发送请求到服务器第一次，浏览器重新发送请求到服务器；此后，浏览器呈现来自缓存的页面浏览器重新发送请求到服务器浏览器呈现来自缓存的页面max-age=xxx (xxx is numeric)在 xxx 秒后，浏览器重新发送请求到服务器在 xxx 秒后，浏览器重新发送请求到服务器浏览器重新发送请求到服务器在 xxx 秒后，浏览器重新发送请求到服务器

默认情况下，Java不完成Cache，需要安装URL类使用的系统级缓存，虚拟机只支持一个共享缓存，通过安装ResponseCache子类处理CacheResponse和CacheRequest

ResponseCache的抽象方法,主要用于存储和获取缓存中的数据

1、public abstract CacheResponse get(URI uri, String requestMethod, Map<String, Lsit<String>> requestHeaders) throws IOException

2、public abstract CacheRequest put(URI uri, URLConnection connection) throws IOException

其中，CacheRequest是一个抽象类，标示在ResponseCache中存储资源的通道，这个类的实例提供了OutputStream对象，协议处理器可以调用该对象将资源存储到缓存中，需要实现有两个方法

1、public abstract OutputStream getBody() throws IOException

2、public abstract void abort(); 删除请求在缓存中存储的所有数据

CacheResponse是一个抽象类，表示从ResponseCache获取资源的通道，这个类提供一个返回正文的InputStream，同时提供一个返回关联响应头的getHeaders()方法，需要实现两个方法

1、public abstract Map<String, List<String>> getHeaders() throws IOException;

2、public abstract InputStream getBody() throws IOException

配置连接

URLConnection字段及其默认值

protected URL

protected boolean doInput = true

protected boolean doOutput = false 指示是否允许该URLConnection是否允许输出；对于http URL如果设置为true，请求方法就有GET改为POST

protected boolean allowUserInteraction = defaultUserInteraction

protected boolean useCaches = defaultUseCache 在有缓存的前提下，可以设置是否启用缓存，默认启用；defaultUseCache通过setDefaultUseCaches设置

protected long ifModifiedSince = 0 指示文档是否过期，文档日期为客户端最近获得文档的日期，在向服务器请求文件时，如果未过期则Server返回304，且不发送该文档，由Client从缓存加载文档，否则重发；

protected boolean connected = false

public URL getURL()：返回URLConnection连接的URL

public void setDoInput(boolean doInput)

public boolean getDonInput()

public void setDoOutput(boolean doOutput)

public boolean getDonOutput()

public void setAllowUserInteraction(boolean allowUserInteraction)

public boolean getAllowUserInteraction()

public void setUseCaches(boolean useCaches)

public boolean getUseCaches()

public void setIfModifiedSince(long ifModifiedSince)

public long getIfModifiedSince()

Note:上述方法只能在URLConnection连接之前修改，否则抛出IllegalArguementException；对于URLConnection的字段，Java中没有访问和设置connected字段的方法，但任何导致URLConnection连接的方法都会该字段设置为true/false，URLConnection的很多方法会读取该值，因此通过派生URLConnection子类编写的协议处理器，都必须正确维护该字段的值！

public static boolean getDefaultUseCaches()

public static void setDefaultUseCaches(boolean defaultUseCaches)

publiic static void setDefaultAllowUserInteraction(boolean defaultAllowUseCaches)

public static boolean getDefaultAllowUseInteraction()

public static FileNameMap getFileNameMap()

public static void setFileNameMap(FileNameMap map)

Note:上述方法可以在任何时候调用，但只会对设置默认值后打开的URLConnection生效

超时——时间单位毫秒，等于0永不超时，小于0抛出IllegalArguementException

public void setConnectionTimeout(int timeOut)

public int getConnectionTimeout()

设置/读取Socket等待服务器响应的时间，超时则抛出SocketTimeoutException

public void setReadTimeout(int timeout)

public int getReadTimeout()

设置/读取输入流等待数据到达时间

配置客户端请求http首部

配置

setRequestProperty(String name, String value)

addRequestProperty(String name, String value)

读取配置

public String getRequestProperty(String name)

public Map<String, List<String>> getRequestProperty()

向服务器写入数据

写入数据使用getOutputStream()方法，由于URLConnection默认不允许输出，因此在输出之前需要setDoOutput，得到outputstream后，就可以使用一般的输出流写出数据

提交表单的一般步骤

1、确定要发送的键值对

2、编写接收和处理请求的服务端程序

3、创建查询字符串，并使用适当的编码

4、打开URLConnection，实现需要连接的URL

5、调用setDoOutput

6、将查询字符串写入URLConnection的outputstream

7、关闭outputstream

8、从inputstream读取数据

Note：get只用于可以设置书签、链接的安全的操作，post用于非安全操作

import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.Reader;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

import ch5.QueryString;

public class FormPoster {
     private URL url;
     private QueryString query = new QueryString();

     public FormPoster(URL url) {
           if (!url.getProtocol().toLowerCase().startsWith("http")) {
                throw new IllegalArgumentException("posting only for http urls");
           }
           this.url = url;
     }

     public void add(String name, String value) {
           query.add(name, value);
     }

     public URL getURL() {
           return this.url;
     }

     public InputStream post() throws IOException {
           URLConnection uc = url.openConnection();
           uc.setDoOutput(true);

           // 这里自动关闭流，如果不关闭没有数据返回
           // 因为这里采用的是利用缓存的方式计算content-length字段，而该方式需要将输出流的数据全部写入缓冲区
           // 直到输出流关闭，才能计算出总长度，进而设置content-type字段，最后发送数据
           try (OutputStreamWriter out = new OutputStreamWriter(uc.getOutputStream(), "UTF-8")) {
                /**
                 * post 行、content-length、content-type由URLConnection发送
                 * 我们只需要发送数据
                 */
                out.write(query.toString());
                // 空行
                out.write("\r\n\r\n");
                out.flush();
           }
           return uc.getInputStream();
     }

     public static void main(String[] args) {
           URL url;
           String str = "http://www.cafeaulait.org/books/jnp4/postquery.phtml";

           try {
                url = new URL(str);
           } catch (MalformedURLException e) {
                System.out.println(e);
                return;
           }

           FormPoster poster = new FormPoster(url);
           poster.add("name", "wyc");
           poster.add("email", "123");

           try (InputStream in = poster.post()) {
                Reader r = new InputStreamReader(in);
                int c;
                while ((c = r.read()) != -1) {
                     System.out.print((char) c);
                }
           } catch (IOException e) {
                System.out.println(e);
           }

     }
}

URLConnection的安全性

在尝试链接URL时，通过调用 public Permission getPermission() throws IOException 获取权限信息，如不要权限则返回null，否则根据需要返回 java.security.Permission 的子类

猜测MIME类型

public static String guessContentTypeFromName(String name)

通过URL的文件扩展名猜测文件类型

public static String guessContentTypeFromStream(InputStream in)

通过查看流中前几个字节猜测文件类型，要求 inputstream 支持标记，以便回退到流开始位置

HttpURLConnection

HttpURLConnection是URLConnection的抽象子类，由于其构造器是protected的，通过使用 http url 构造一个URL，将openConnection返回的URLConnection强制转换成HttpURLConnection

HTTPConnection中支持的请求方法

默认采用GET方式，可以通过setRequestMethod改变请求方式

HEAD，返回响应首部，不返回实体内容，主要用于查询调试

DELETE，删除URL对应的文件，需要注意的是，不同服务器对待delete的响应不同，可能删除、移动至回收站、标记不可读、拒绝响应，依赖于服务器配置

PUT，向服务器存储文件，该方法允许客户端将文档放在网站的抽象层次结构中，不需要映射到实际的本地文件系统，而FTP需要使用本地文件系统；该方法依赖于服务器配置，一般不支持

OPTIONS，询问某个URL支持的选项，若请求的URL为 * ，则标示该请求应用与整个服务器而不是服务器上的某个URL；需要注意的是，响应中Allow字段列出的只是服务器理解的方法，但不一定会执行该命令

TRACE，发送请求首部值服务器，主要用于查看服务器和客户端之间代理服务器的修改

断开与服务器的连接

由于http1.1的Keep-Alive允许通过一个TCP Socket发送多个请求响应，服务器不会由于已经想客户端发送最后的字节而自动关闭连接，可能有5秒的非活动时间，但一般由客户端主动关闭连接，HttpURLConnection透明支持Keep-Alive，除非显式关闭；客户端在与服务端结束通信时，需要使用 public abstract void disconnect() 关闭socket；注意，关闭socket会关闭所有未关闭的流，关闭流不会关闭socket

处理服务器响应

这里响应主要指HTTP响应的状态行，响应码和状态信息

public int getResponseCode() throws IOException 返回响应码

public String getResponseMessage() throws IOException 返回响应信息

import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;

public class SoueceView3 {

     public static void main(String[] args) {
           String str = "https://www.baidu.com";

           try {
                URL u = new URL(str);
                HttpURLConnection uc = (HttpURLConnection) u.openConnection();
                int code = uc.getResponseCode();
                String message = uc.getResponseMessage();
                // HttpURLConnection 中没有直接访问响应状态行协议版本的方法，但可以使用
                // URLConnection的getHeaderField(0)获取完整状态行
                System.out.println("HTTP/1.X " + code + " " + message);
                // 注意应该从 1 开始， 第0 行是状态行，不是 key-value结构
                for(int j = 1; ; j++ ) {
                     String header = uc.getHeaderField(j);
                     String key = uc.getHeaderFieldKey(j);
                     if (header == null || key == null) {
                           break;
                     }
                     System.out.println(key + ":" + header);
                }
                System.out.println();
                try (InputStream in = new BufferedInputStream(uc.getInputStream())) {
                     Reader r = new InputStreamReader(in);
                     int c;
                     while((c = r.read()) != -1) {
                           System.out.print((char) c);
                     }
                }
           } catch (MalformedURLException e) {
                System.out.println(e);
           } catch (IOException e) {
                System.out.println(e);
           }
     }

}

错误条件

在服务器遇到错误是，在返回错误信息的时候，不但会返回错误码，还将返回其他信息，这是可以在 catch 中调用 public InputStream getErrorStream() （没有出错在该方法返回null）

import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;

public class SourceViewr4 {
     public static void main(String[] args) {
           String str = "https://www.baidu.com";

           try {
                URL u = new URL(str);
                HttpURLConnection uc = (HttpURLConnection)u.openConnection();

                try (InputStream in = new BufferedInputStream(uc.getInputStream())) {
                     // 正常情况
                     printFromStream(in);
                } catch (IOException e) {
                     // 异常情况
                     printFromStream(uc.getErrorStream());
                }
           } catch (MalformedURLException e) {
                System.out.println(e);
           } catch (IOException e) {
                System.out.println(e);
           }
     }

     public static void printFromStream(InputStream in) throws IOException {
                try (InputStream buffer = new BufferedInputStream(in)) {
                     Reader r = new InputStreamReader(buffer);
                     int c;
                     while ((c = r.read()) != -1) {
                           System.out.print((char) c);
                     }
                }
     }

}

重定向

对于3xx响应码，请求资源不在原始位置上，但会在其他位置上找到，这是浏览器一般会自动从新的位置加载文档，但存在一定的风险，HttpURLConnection默认会跟随重定向，但可以通过

public static boolean getFollowRedirect()

public static void setFollowRedirect(boolean follow)

进行读取/设置是否允许重定向，该方法为静态方法，会调用该方法后的所有HtrtpURLConnection对象的行为，并且如果安全管理器不允许修改的话，set方法将抛出SecurityException异常；HttpURLConnection还提供了针对实例的重定向配置

public booloean getInstanceFollowRedirect()

public void setInstanceFollowRedirect(boolean followRedirect)

代理

通过 public abstract boolean usingProxy()返回是否使用的代理服务器

流模式

问题在于，http首部中的content-length字段在消息主体的前面，写长度字段时，主体长度未知

Java的解决方法，对于从URLConnection获取的outputstream，缓存所有需要输出的数据，直到流关闭，然后在发送数据

上述方案的问题：响应较长的表单时，响应负担很大，是否可以发送第一个字节前不需要等待最后一个字节，有如下两种解决方法

1、预先知道数据大小，使用下述方法，Java会立即通过网络以流的方式传输，固定长度流模式对服务器是透明的

public void setFixedLengthStreamingMode(int contentLength) throws IOException

public void setFixedLengthStreamingMode(long contentLength) throws IOException

2、使用分块输出编码的方式，将内容分为多个部分，每个部分有自己单独的长度，需要在连接之前将分块大小传入

public void setChunkedStreamingMode(int chunkLength)

上述两种方式的共同缺点：都会方妨碍URL的重定向和认证，在需要重定向和认证的情况下将会抛出HtttpRetryException异常，因此非必要情况下应该避免使用该模式

0 0