HTTP1.1认识chunked编码以及使用socket对chunked解码（Java）

来源：互联网发布：哈登身体数据编辑：程序博客网时间：2024/06/15 18:54

HTTP1.1认识chunked编码以及使用socket对chunked解码（Java）

最近在恶补android的网络方面，练习分别使用socket和HttpURLConnection下载、上传文件。在HttpURLConnection上基本都没什么问题，然而HttpURLConnection封装得太好了，只是学会了使用这个还算不上学会网络。要想更深入地学习网络，就不可避免地要接触到socket了。然而在用socket时，第一次遇到了chunked编码，让我非常头疼。

在HTTP1.1的头部信息中有Content-Length这一项，它表明了即将传输的数据正文的大小，以字节为单位。这一般没什么问题，但是很多时候其实服务器无法预先知道你的数据的大小，这个时候就要遇到chunked编码。

chunked意味分块，表示服务器将会把数据分块传输，一般有chunked的信息的消息头部是这样的：

HTTP/1.1 200 OKDate: Wed, 01 Mar 2017 02:31:20 GMTServer: Apache/2.4.23 (Win64) PHP/5.6.25X-Powered-By: PHP/5.6.25Transfer-Encoding: chunkedContent-Type: text/html; charset=UTF-8

可以看到，原本应该有的Content-Length不见了，变为了Transfer-Encoding。
我的服务器脚本是这样写的

<?php/** * Created by PhpStorm. * User: zu * Date: 2017/2/22 * Time: 14:50 *//**下载文件的服务器脚本。  *///判断post请求中是否有file_name这个变量，如果有就下载该文件if(empty($_POST["file_name"])){    echo "NO_FILE_NAME\n";    print_r($_POST);    exit();}/*由于文件是存储在windows系统上，文件名都是GB2312编码，所以还是要转换一下文件名，看存放资源的父文件夹在不在*/$path = iconv("utf-8", "GB2312", "F:\\NetEaseMusic\\download\\");if(!file_exists($path)){    echo "文件夹不存在\n";    print_r($path);    exit();}$file_map = array();$files = scandir($path);for($i = 0; $i < count($files); $i++ ){    $file_map[iconv("GB2312", "utf-8", $files[$i])] = $files[$i];}if(!array_key_exists($_POST["file_name"], $file_map)){    echo "FILE_KEY_NOT_FOUND\n";    print_r($file_map);    exit();}/*拼接出文件路径然后转码，用来寻找文件。*/$path = iconv("utf-8", "GB2312", "F:\\NetEaseMusic\\download\\".$_POST["file_name"]);//$path = "F:\\NetEaseMusic\\download\\".$file_map[$_POST["file_name"]];if (!file_exists ( $path )) {    echo "FILE_NOT_FOUND\n";    echo "F:\\NetEaseMusic\\download\\".$_POST["file_name"]."\n";    print($path);    exit ();}/*下面的代码用于断点续传，如果客户端发送了有效的range信息，就从range开始发送文件，而不是从头开始*/$file_size = filesize($path);$begin = 0;$end = 0;if(isset($_SERVER["HTTP_RANGE"])){    $temp = $_SERVER["HTTP_RANGE"];    if(preg_match('/bytes=\h*(\d+)-(\d*)[\D.*]?/i',$temp, $matcher))    {        $begin = intval($matcher[0]);        $end = intval($matcher[1]);    }}if($begin == 0){    header("HTTP/1.1 200 OK");}else{    header("HTTP/1.1 206 Partial Content");}//header("Content-type: application/octet-stream");//header("content-length: ".$file_size);//header("Accept-Ranges: bytes");//header("Accept-Length:".$file_size);//header("Content-Disposition: attachment; filename=".$path);/*不停读取文件并将数据流发送出去*/$file = fopen($path, "r");fseek($file, $begin, 0);while(!feof($file)){    echo fread($file, 1024);}exit();?>

可以看出服务器在向客户端返回文件的时候，其实只是脚本一直在读文件并且把数据流传送给服务器。而服务器必然会有个缓存区域，用来存储脚本输出的内容。如果发送的数据小于缓存区域，那么服务器会计算数据大小并且自动在响应头中添加Content-Length；如果缓存区满而脚本仍然不断输出，并且此时脚本也没有通知服务器大小是多少，那么服务器就会使用chunked编码方式。也就是说，如果发送一个几kb的文本，那么服务器就会自动得出Content-Length并添加到响应头中。但是如果是几个G的大文件，在没有手动通知服务器Content-Length的情况下，就会使用chunked编码。
注意上面的脚本中，如果将header("content-length: ".$file_size);这一行取消注释，那么服务器就不会使用chunked编码，而且响应头中会包含刚才设置的content-length。

而在使用socket的情况下对chunked进行解码，原理说起来其实很简单，只是实际操作时需要考虑的情况比较多。下面先看chunked编码规则：

HTTP/1.1 200 OK\r\nDate: Wed, 01 Mar 2017 02:31:20 GMT\r\nServer: Apache/2.4.23 (Win64) PHP/5.6.25\r\nX-Powered-By: PHP/5.6.25\r\nTransfer-Encoding: chunked\r\nContent-Type: text/html; charset=UTF-8\r\n\r\nchunked-length\r\nchunked-body\r\n...chunked-length\r\nchunked-body\r\n0\r\n\r\n

这是使用socket获得的完整的使用chunked编码的数据，保留了所有信息。可以看出来，报头仍然是以\r\n结尾的，接下来是数据。一个chunk块的结构则是

chunked-length\r\nchunked-body\r\n

chunked-length是表示chunked-body的字节数量的十六进制数字，千万注意是十六进制。这个长度只包含chunked-body的长度，不包含前后跟的\r\n。最后在所有块都传输完毕后，会再传输一个空的块来通知客户端数据发送完毕。

要注意解码的时候不能以\r\n为判断依据，只能以chunked-length。因为也许正文中也含有\r\n。
我的目标是写一个能够在边下载边解码的程序。因此重点在于读取数据时发生的各种情况，比如可能会把chunked-length给截断、或者把\r\n给截断。这都会导致无法读取到正确的数据，而且是一步错步步错。下面上代码。

这是通过socket下载文件的方法，也可以解码chunked编码。

    private void downloadFileBySocketWithChunke(String urlString, String fileName)    {        try{            /**拼接post请求，注意发送的post数据要进行编码，否则服务器无法识别到。而头部则可不编码*/            StringBuilder sb = new StringBuilder();            String data = URLEncoder.encode("file_name", "utf-8") + "=" +  URLEncoder.encode(fileName, "utf-8");//            String data = URLEncoder.encode("file_name="+fileName, "utf-8");            sb.append("POST " + urlString + " HTTP/1.1\r\n");            sb.append("Host: 10.206.68.242\r\n");            sb.append("Content-Type: application/x-www-form-urlencoded\r\n");            sb.append("Content-Length: " + data.length() + "\r\n");            sb.append("\r\n");            sb.append(data + "\r\n");            String temp = sb.toString();//            sb.append( URLEncoder.encode("file_name", "utf-8") + "=" +  URLEncoder.encode(fileName, "utf-8") + "\r\n");            System.out.println(temp);            URL url = new URL(urlString);            Socket socket = new Socket(url.getHost(), url.getPort());            BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), "utf-8"));            /**将post请求通过socket发送到服务器*/            writer.write(sb.toString());            writer.flush();            File file = new File("./" + fileName);            DataOutputStream out = null;            DataInputStream in = null;            try{                out = new DataOutputStream(new FileOutputStream(file));                in = new DataInputStream(socket.getInputStream());                /**缓存从socket读取的数据的buffer*/                byte[] buffer = new byte[1024];                /**本次从socket读取了多少字节*/                int readBytes = 0;                /**从buffer中向外取数据时当前读取的位置*/                int readPosition = 0;                /**是否已经获得了头部信息*/                boolean getHead = false;                /**当前是否为chunk编码*/                boolean chunked = false;                /**缓存头部信息的*/                StringBuilder headTemp = new StringBuilder();                /**完整的头部信息*/                String head = null;                /**该chunk块的大小*/                int chunkedSize = 0;                /**该chunk块已经读取到的大小*/                int readSize = 0;                /**                 * 用于存储上一轮读取中遗留的信息。假如在上一轮读取完了一个chunk块，要提取下一个chunk块的大小信息时，发现                 * 这些信息时被截断了，那就要把这些信息存在这里，待下一轮从socket读取后，将这些信息拼接上去再分析。                 * */                byte[] chunkedSizeBuffer = new byte[32];                /**存储在chunkedSizeBuffer里的有效信息的长度*/                int chunkedSizeBit = 0;                /**不断从socket中读取*/                while((readBytes = in.read(buffer)) != -1)                {                    readPosition = 0;                    /**没有获得头部就先获得头部，头部与正文以\r\n\r\n区分，\r的十六进制数字是0x0d，而\n是0x0a*/                    if(!getHead)                    {                        /**找出一个byte[]在另一个byte[]中的序号，返回-1是没找到*/                        int position = findByte(buffer,0, readBytes, new byte[]{0x0d,0x0a,0x0d,0x0a});                        if(position == -1)                        {                            /**没找到证明这次读取的全都是头部信息，放入头部缓存*/                            headTemp.append(new String(buffer));                            continue;                        }else                        {                            /**找到则分析头部信息*/                            byte[] headBytes = new byte[position];                            System.arraycopy(buffer, 0, headBytes, 0, position);                            headTemp.append(new String(headBytes));                            head = headTemp.toString();                            getHead = true;                            String[] infoList = head.split("\r\n");                            if(!infoList[0].split(" ")[1].equals("200"))                            {                                throw new RuntimeException("连接失败，状态码为" + infoList[0].split(" ")[1]);                            }                            /**查询是否为chunked编码*/                            for(String s : infoList)                            {                                if (s.toLowerCase().contains("chunked"))                                {                                    chunked = true;                                    break;                                }                            }                            /**将读取位置向后移4个字节，到达正文。*/                            readPosition = position + 4;                        }                    }                    if(!chunked)                    {                        /**如果不是chunk，直接写入*/                        out.write(buffer, readPosition, readBytes - readPosition);                    }else                    {                        while(readPosition < readBytes)                        {                            /**判断该chunk块是否已读取完毕，如果是，就要获取下一个chunk块的长度信息*/                            if(chunkedSize == readSize)                            {//                                System.out.println("chunkedSize == readSize");                                /**如果buffer中未读的字节数小于10，就判断这次是把chunk长度信息给截断了，就放在                                 * chunkedSizeBuffer里等待下一轮读取后拼接再分析                                 * */                                if(readBytes - readPosition < 10)                                {//                                    System.out.println("readBytes - readPosition < 10");                                    for(chunkedSizeBit = 0; chunkedSizeBit < readBytes - readPosition; chunkedSizeBit++)                                    {                                        chunkedSizeBuffer[chunkedSizeBit] = buffer[readPosition + chunkedSizeBit];                                    }                                    readPosition = readBytes;//                                    System.out.println("readPosition = " + readPosition + ",readBytes = " + readBytes);                                    continue;                                }                                /**如果chunkedSizeBit不为0，说明在上一轮分析中有被截断在遗留信息在chunkedSizeBuffer里，                                 * 需要和这次buffer里的数据先拼接*/                                if(chunkedSizeBit != 0)                                {//                                    System.out.println("chunkedSizeBit != 0, chunkedSizeBit = " + chunkedSizeBit);                                    byte[] a = new byte[chunkedSizeBit + readBytes];                                    System.arraycopy(chunkedSizeBuffer, 0, a, 0, chunkedSizeBit);                                    System.arraycopy(buffer, 0, a, chunkedSizeBit, readBytes);                                    buffer = a;                                    readBytes = chunkedSizeBit + readBytes;                                    chunkedSizeBit = 0;                                }                                /**判断下buffer里的下一个读取数据是否是长度前面的\r\n，如果是要剔除掉*/                                if(buffer[readPosition] == 0x0d && buffer[readPosition + 1] == 0x0a)                                {//                                    System.out.println("buffer[readPosition] == 0x0d && buffer[readPosition + 1] == 0x0a");                                    readPosition += 2;                                }                                /**获取长度，如果超出32个还未读取到长度后面的\r\n，就说明读取出错。*/                                int count = 0;                                byte r1 = 0;                                byte n1 = 0;                                byte[] size = new byte[32];                                while((r1 = buffer[readPosition++]) != 0x0d)                                {                                    size[count] = r1;                                    count++;                                    if(count >= 32)                                    {                                        System.out.println("read /r error");                                        System.out.println(new String(buffer, readPosition - count, readBytes - (readPosition - count)));                                        return;                                    }                                }                                if((n1 = buffer[readPosition++]) != 0x0a)                                {                                    System.out.println("read /n error");                                    System.out.println(new String(buffer, readPosition - count, readBytes - (readPosition - count)));                                    return;                                }//                                System.out.println("chunked size:" + new String(size, 0, count));                                /**千万注意是十六进制*/                                chunkedSize = Integer.parseInt(new String(size, 0, count), 16);                                readSize = 0;                            }                            /**以下是将buffer里的内容按照长度写到文件里，分两种情况。需要注意的是readPosition和readSize要                             * 及时变化。*/                            if(readBytes - readPosition >= chunkedSize - readSize)                            {//                                System.out.println("readBytes - readPosition >= chunkedSize - readSize");                                out.write(buffer, readPosition, chunkedSize - readSize);                                readPosition += chunkedSize - readSize;                                readSize = chunkedSize;                            }else                            {//                                System.out.println("readBytes - readPosition < chunkedSize - readSize");                                out.write(buffer, readPosition, readBytes - readPosition);                                readSize += readBytes - readPosition;                                readPosition = readBytes;                            }                        }                    }                }                out.flush();            }catch (Exception e1)            {                e1.printStackTrace();            }finally {                try{                    if(in != null)                    {                        in.close();                    }                    if(out != null)                    {                        out.flush();                        out.close();                    }                }catch (Exception e2)                {                    e2.printStackTrace();                }            }            socket.close();        }catch (Exception e)        {            e.printStackTrace();        }    }

这是获取一个byte[]在另一个byte[]中的索引的方法，很简单就不写注释了。

    private int findByte(byte[] src, byte[] mark)    {        return findByte(src, 0, src.length, mark);    }    private int findByte(byte[] src, int start, int length, byte[] mark)    {        if(length < mark.length || length > src.length)        {            return -1;        }        for(int i = start; i < length - mark.length; i++)        {            if(src[i] == mark[0])            {                for(int j = 0; j < src.length; j++)                {                    if(src[j + i] == mark[j])                    {                        if(j == mark.length - 1)                        {                            return i;                        }                    }else                    {                        break;                    }                }            }        }        return -1;    }

以上就是一个使用socket进行下载并支持chunked解码的例子。当然这个还不是最好的，实际使用过程中下载一个十几MB的文件时和直接使用socket不进行解码的速度差不多，不过下载一个1.74G的电影时会多几秒，当然这是我在本机实验下载的，传输速度很快的。如果是实用的话，网络环境应该会成为瓶颈而不是解码耗时。然而最快的还是HttpURLConnection，快非常多。当然这可能是socket的固有缺陷，毕竟用socket不解码时下载也很慢。
还有另一种思路，就是将buffer的大小设置为本个chunk块的大小，然后只要从socket中读数据并向里填即可，只要buffer满了就说明该块读取完毕。不会出现将chunk块截断的问题。如果时间充裕我会在下一篇博客中实现这种方法。

0 0