j2me解压GZIP

来源:互联网 发布:jenkins在linux下安装 编辑:程序博客网 时间:2024/04/30 05:54

GZIP最早由Jean-loup Gailly和Mark Adler创建,用于UNIX系统的文件压缩。我们在Linux中经常会用到后缀为.gz的文件,它们就是GZIP格式的。现今已经成为Internet上使用非常普遍的一种数据压缩格式,或者说一种文件格式。HTTP协议上的GZIP编码是一种用来改进WEB应用程序性能的技术。大流量的WEB站点常常使用GZIP压缩技术来让用户感受更快的速度。

GZIP本身只是一种文件格式,其内部通常采用DEFLATE数据格式,而DEFLATE采用LZ77压缩算法来压缩数据。

GZIP文件由1到多个“块”组成,实际上通常只有1块。每个块包含头、数据和尾三部分。块的概貌如下:

+---+---+---+---+---+---+---+---+---+---+========//========+===========//==========+---+---+---+---+---+---+---+---+

|ID1|ID2| CM|FLG|     MTIME     |XFL| OS|   额外的头字段   |       压缩的数据      |     CRC32     |     ISIZE     |

+---+---+---+---+---+---+---+---+---+---+========//========+===========//==========+---+---+---+---+---+---+---+---+

1. 头部分

ID1与ID2:各1字节。固定值,ID1 = 31 (0x1F),ID2 = 139(0x8B),指示GZIP格式。

CM:1字节。压缩方法。目前只有一种:CM = 8,指示DEFLATE方法。

FLG:1字节。标志。

bit 0 FTEXT - 指示文本数据

bit 1 FHCRC - 指示存在CRC16头校验字段

bit 2 FEXTRA - 指示存在可选项字段

bit 3 FNAME - 指示存在原文件名字段

bit 4 FCOMMENT - 指示存在注释字段

bit 5-7 保留

MTIME:4字节。更改时间。UINX格式。

XFL:1字节。附加的标志。当CM = 8时,XFL = 2 - 最大压缩但最慢的算法;XFL = 4 - 最快但最小压缩的算法

OS:1字节。操作系统,确切地说应该是文件系统。有下列定义:

0 - FAT文件系统 (MS-DOS, OS/2, NT/Win32)

1 - Amiga

2 - VMS/OpenVMS

3 - Unix

4 - VM/CMS

5 - Atari TOS

6 - HPFS文件系统 (OS/2, NT)

7 - Macintosh

8 - Z-System

9 - CP/M

10 - TOPS-20

11 - NTFS文件系统 (NT)

12 - QDOS

13 - Acorn RISCOS

255 - 未知

额外的头字段:

(若 FLG.FEXTRA = 1)

+---+---+---+---+===============//================+

|SI1|SI2|  XLEN |      长度为XLEN字节的可选项     |

+---+---+---+---+===============//================+

(若 FLG.FNAME = 1)

+=======================//========================+

|               原文件名(以NULL结尾)              |

+=======================//========================+

(若 FLG.FCOMMENT = 1)

+=======================//========================+

|   注释文字(只能使用iso-8859-1字符,以NULL结尾)  |

+=======================//========================+

(若 FLG.FHCRC = 1)

+---+---+

| CRC16 |

+---+---+

存在额外的可选项时,SI1与SI2指示可选项ID,XLEN指示可选项字节数。如 SI1 = 0x41 ('A'),SI2 = 0x70 ('P'),表示可选项是Apollo文件格式的额外数据。

2. 数据部分

DEFLATE数据格式,包含一系列子数据块。子块概貌如下:

+......+......+......+=============//============+

|BFINAL|    BTYPE    |            数据           |

+......+......+......+=============//============+

BFINAL:1比特。0 - 还有后续子块;1 - 该子块是最后一块。

BTYPE:2比特。00 - 不压缩;01 - 静态Huffman编码压缩;10 - 动态Huffman编码压缩;11 - 保留。

各种情形的处理过程,请参考后面列出的RFC文档。

3. 尾部分

CRC32:4字节。原始(未压缩)数据的32位校验和。

ISIZE:4字节。原始(未压缩)数据的长度的低32位。

GZIP中字节排列顺序是LSB方式,即Little-Endian,与ZLIB中的相反。

 

在j2se中java有java.util.zip.*包来实现对GIP的解压,但是j2me中没有。自己实现GZIP的解压缩算法如下:

package com.DriverBook.mtraffic;

 

/*
 * GZIP.java
 *
 * Created on 2007年10月17日, 下午3:37
 *
 * To change this template, choose Tools | Template Manager
 * and open the template in the editor.
 */
/**
 *
 * @author hedonist
 */
/*
 * GZIP library for j2me applications.
 *
 * Copyright (c) 2004-2006 Carlos Araiz (caraiz@java4ever.com)
 *
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * This library is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public
 * License along with this library; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */
import java.io.*;

public class GZIP {
  // M醩caras para el flag.
/** FLG:1字节。标志。

 bit 0 FTEXT - 指示文本数据

 bit 1 FHCRC - 指示存在CRC16头校验字段

 bit 2 FEXTRA - 指示存在可选项字段

 bit 3 FNAME - 指示存在原文件名字段

 bit 4 FCOMMENT - 指示存在注释字段 **/
  private static final int FTEXT_MASK = 1;
  private static final int FHCRC_MASK = 2;
  private static final int FEXTRA_MASK = 4;
  private static final int FNAME_MASK = 8;
  private static final int FCOMMENT_MASK = 16;
  // Tipos de bloques.
  //BTYPE:2比特。00 - 不压缩;01 - 静态Huffman编码压缩;10 - 动态Huffman编码压缩;11 - 保留
  private static final int BTYPE_NONE = 0;
  private static final int BTYPE_FIXED = 1;
  private static final int BTYPE_DYNAMIC = 2;
  private static final int BTYPE_RESERVED = 3;
  // L韒ites.
  private static final int MAX_BITS = 16;
  private static final int MAX_CODE_LITERALS = 287;
  private static final int MAX_CODE_DISTANCES = 31;
  private static final int MAX_CODE_LENGTHS = 18;
  private static final int EOB_CODE = 256;
  // Datos prefijados (LENGTH: 257..287 / DISTANCE: 0..29 / DYNAMIC_LENGTH_ORDER: 0..18).
  private static final int LENGTH_EXTRA_BITS[] = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0, 99, 99};
  private static final int LENGTH_VALUES[] = {3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31, 35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258, 0, 0};
  private static final int DISTANCE_EXTRA_BITS[] = {0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13};
  private static final int DISTANCE_VALUES[] = {1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193, 257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145, 8193, 12289, 16385, 24577};
  private static final int DYNAMIC_LENGTH_ORDER[] = {16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15};
  /*************************************************************************/

  // Variables para la lectura de datos comprimidos.
  private static int gzipIndex,  gzipByte,  gzipBit;

  /*************************************************************************/
  /*************************************************************************/
  /**
   * Descomprime un fichero GZIP.
   *
   * @param gzip Array con los datos del fichero comprimido
   *
   * @return Array con los datos descomprimidos
   */
  /**GZIP的数据结构
   * +---+---+---+---+---+---+---+---+---+---+========//========+===========//==========+---+---+---+---+---+---+---+---+

|ID1|ID2| CM|FLG|     MTIME     |XFL| OS|   额外的头字段   |       压缩的数据      |     CRC32     |     ISIZE     |

+---+---+---+---+---+---+---+---+---+---+========//========+===========//==========+---+---+---+---+---+---+---+---+

   */
  public static byte[] inflate(byte gzip[]) throws IOException {
    // Inicializa.
    gzipIndex = gzipByte = gzipBit = 0;
    // Cabecera.
    //ID1与ID2:各1字节。固定值,ID1 = 31 (0x1F),ID2 = 139(0x8B),指示GZIP格式。
    if (readBits(gzip, 16) != 0x8B1F || readBits(gzip, 8) != 8) {
      throw new IOException("Invalid GZIP format");
    // Flag.
    }
    int flg = readBits(gzip, 8);     
    // Fecha(4) / XFL(1) / OS(1).
    gzipIndex += 6;
    // Comprueba los flags.
    if ((flg & FEXTRA_MASK) != 0) {
      gzipIndex += readBits(gzip, 16);
    }
    if ((flg & FNAME_MASK) != 0) {
      while (gzip[gzipIndex++] != 0);
    }
    if ((flg & FCOMMENT_MASK) != 0) {
      while (gzip[gzipIndex++] != 0);
    }
    if ((flg & FHCRC_MASK) != 0) {
      gzipIndex += 2;
    // Tama駉 de los datos descomprimidos.
    }
    int index = gzipIndex;
    gzipIndex = gzip.length - 4;
    byte uncompressed[] = new byte[readBits(gzip, 16) | (readBits(gzip, 16) << 16)];
    int uncompressedIndex = 0;
    gzipIndex = index;
    // Bloque con datos comprimidos.
    int bfinal = 0, btype = 0;
    do {
      // Lee la cabecera del bloque.
      bfinal = readBits(gzip, 1);
      btype = readBits(gzip, 2);
      // Comprueba el tipo de compresi髇.
      if (btype == BTYPE_NONE) {
        // Ignora los bits dentro del byte actual.
        gzipBit = 0;
        // LEN.
        int len = readBits(gzip, 16);
        // NLEN.
        int nlen = readBits(gzip, 16);
        // Lee los datos.
        System.arraycopy(gzip, gzipIndex, uncompressed, uncompressedIndex, len);
        gzipIndex += len;
        // Actualiza el 韓dice de los datos descomprimidos.
        uncompressedIndex += len;
      } else {
        int literalTree[], distanceTree[];
        if (btype == BTYPE_DYNAMIC) {
          // N鷐ero de datos de cada tipo.
          int hlit = readBits(gzip, 5) + 257;
          int hdist = readBits(gzip, 5) + 1;
          int hclen = readBits(gzip, 4) + 4;
          // Lee el n鷐ero de bits para cada c骴igo de longitud.
          byte lengthBits[] = new byte[MAX_CODE_LENGTHS + 1];
          for (int i = 0; i < hclen; i++) {
            lengthBits[DYNAMIC_LENGTH_ORDER[i]] = (byte) readBits(gzip, 3);
          }
          // Crea los c骴igos para la longitud.
          int lengthTree[] = createHuffmanTree(lengthBits, MAX_CODE_LENGTHS);
          // Genera los 醨boles.
          literalTree = createHuffmanTree(decodeCodeLengths(gzip, lengthTree, hlit), hlit - 1);
          distanceTree = createHuffmanTree(decodeCodeLengths(gzip, lengthTree, hdist), hdist - 1);
        } else {
          byte literalBits[] = new byte[MAX_CODE_LITERALS + 1];
          for (int i = 0; i < 144; i++) {
            literalBits[i] = 8;
          }
          for (int i = 144; i < 256; i++) {
            literalBits[i] = 9;
          }
          for (int i = 256; i < 280; i++) {
            literalBits[i] = 7;
          }
          for (int i = 280; i < 288; i++) {
            literalBits[i] = 8;
          }
          literalTree = createHuffmanTree(literalBits, MAX_CODE_LITERALS);
          //
          byte distanceBits[] = new byte[MAX_CODE_DISTANCES + 1];
          for (int i = 0; i < distanceBits.length; i++) {
            distanceBits[i] = 5;
          }
          distanceTree = createHuffmanTree(distanceBits, MAX_CODE_DISTANCES);
        }
        // Descomprime el bloque.
        int code = 0, leb = 0, deb = 0;
        while ((code = readCode(gzip, literalTree)) != EOB_CODE) {
          if (code > EOB_CODE) {
            code -= 257;
            int length = LENGTH_VALUES[code];
            if ((leb = LENGTH_EXTRA_BITS[code]) > 0) {
              length += readBits(gzip, leb);
            }
            code = readCode(gzip, distanceTree);
            int distance = DISTANCE_VALUES[code];
            if ((deb = DISTANCE_EXTRA_BITS[code]) > 0) {
              distance += readBits(gzip, deb);
            // Repite la informaci髇.
            }
            int offset = uncompressedIndex - distance;
            while (distance < length) {
              System.arraycopy(uncompressed, offset, uncompressed, uncompressedIndex, distance);
              uncompressedIndex += distance;
              length -= distance;
              distance <<= 1;
            }
            System.arraycopy(uncompressed, offset, uncompressed, uncompressedIndex, length);
            uncompressedIndex += length;
          } else {
            uncompressed[uncompressedIndex++] = (byte) code;
          }
        }
      }
    } while (bfinal == 0);
    //
    return uncompressed;
  }

  /**
   * Lee un n鷐ero de bits
   *
   * @param n N鷐ero de bits [0..16]
   */
  private static int readBits(byte gzip[], int n) {
    // Asegura que tenemos un byte.
    int data = (gzipBit == 0 ? (gzipByte = (gzip[gzipIndex++] & 0xFF)) : (gzipByte >> gzipBit));
    // Lee hasta completar los bits.
    for (int i = (8 - gzipBit); i < n; i += 8) {
      gzipByte = (gzip[gzipIndex++] & 0xFF);
      data |= (gzipByte << i);
    }
    // Ajusta la posici髇 actual.
    gzipBit = (gzipBit + n) & 7;
    // Devuelve el dato.
    return (data & ((1 << n) - 1));
  }

  /**
   * Lee un c骴igo.
   */
  private static int readCode(byte gzip[], int tree[]) {
    int node = tree[0];
    while (node >= 0) {
      // Lee un byte si es necesario.
      if (gzipBit == 0) {
        gzipByte = (gzip[gzipIndex++] & 0xFF);
      // Accede al nodo correspondiente.
      }
      node = (((gzipByte & (1 << gzipBit)) == 0) ? tree[node >> 16] : tree[node & 0xFFFF]);
      // Ajusta la posici髇 actual.
      gzipBit = (gzipBit + 1) & 7;
    }
    return (node & 0xFFFF);
  }

  /**
   * Decodifica la longitud de c骴igos (usado en bloques comprimidos con c骴igos din醡icos).
   */
  private static byte[] decodeCodeLengths(byte gzip[], int lengthTree[], int count) {
    byte bits[] = new byte[count];
    for (int i = 0, code = 0, last = 0; i < count;) {
      code = readCode(gzip, lengthTree);
      if (code >= 16) {
        int repeat = 0;
        if (code == 16) {
          repeat = 3 + readBits(gzip, 2);
          code = last;
        } else {
          if (code == 17) {
            repeat = 3 + readBits(gzip, 3);
          } else {
            repeat = 11 + readBits(gzip, 7);
          }
          code = 0;
        }
        while (repeat-- > 0) {
          bits[i++] = (byte) code;
        }
      } else {
        bits[i++] = (byte) code;
      //
      }
      last = code;
    }
    return bits;
  }

  /**
   * Crea el 醨bol para los c骴igos Huffman.
   */
  private static int[] createHuffmanTree(byte bits[], int maxCode) {
    // N鷐ero de c骴igos por cada longitud de c骴igo.
    int bl_count[] = new int[MAX_BITS + 1];
    for (int i = 0; i < bits.length; i++) {
      bl_count[bits[i]]++;
    }
    // M韓imo valor num閞ico del c骴igo para cada longitud de c骴igo.
    int code = 0;
    bl_count[0] = 0;
    int next_code[] = new int[MAX_BITS + 1];
    for (int i = 1; i <= MAX_BITS; i++) {
      next_code[i] = code = (code + bl_count[i - 1]) << 1;
    }
    // Genera el 醨bol.
    // Bit 31 => Nodo (0) o c骴igo (1).
    // (Nodo) bit 16..30 => 韓dice del nodo de la izquierda (0 si no tiene).
    // (Nodo) bit 0..15 => 韓dice del nodo de la derecha (0 si no tiene).
    // (C骴igo) bit 0..15
    int tree[] = new int[(maxCode << 1) + MAX_BITS];
    int treeInsert = 1;
    for (int i = 0; i <= maxCode; i++) {
      int len = bits[i];
      if (len != 0) {
        code = next_code[len]++;
        // Lo mete en en 醨bol.
        int node = 0;
        for (int bit = len - 1; bit >= 0; bit--) {
          int value = code & (1 << bit);
          // Inserta a la izquierda.
          if (value == 0) {
            int left = tree[node] >> 16;
            if (left == 0) {
              tree[node] |= (treeInsert << 16);
              node = treeInsert++;
            } else {
              node = left;
            }
          } // Inserta a la derecha.
          else {
            int right = tree[node] & 0xFFFF;
            if (right == 0) {
              tree[node] |= treeInsert;
              node = treeInsert++;
            } else {
              node = right;
            }
          }
        }
        // Inserta el c骴igo.
        tree[node] = 0x80000000 | i;
      }
    }
    return tree;
  }

 

原创粉丝点击