zip压缩以及解压 Compressing and Decompressing Data Using Java APIs
来源:互联网 发布:linux 安装syslog 编辑:程序博客网 时间:2024/05/23 16:53
from(http://java.sun.com/developer/technicalArticles/Programming/compression/)
Many sources of information contain redundant data or data that adds little to the stored information. This results in tremendous amounts of data being transferred between client and server applications or computers in general. The obvious solution to the problems of data storage and information transfer is to install additional storage devices and expand existing communication facilities. To do so, however, requires an increase in an organization's operating costs. One method to alleviate a portion of data storage and information transfer is through the representation of data by more efficient code. This article presents a brief introduction to data compression and decompression, and shows how to compress and decompress data, efficiently and conveniently, from within your Java applications using the java.util.zip
package.
While it is possible to compress and decompress data using tools such as WinZip
, gzip
, and Java ARchive (or jar
), these tools are used as standalone applications. It is possible to invoke these tools from your Java applications, but this is not a straightforward approach and not an efficient solution. This is especially true if you wish to compress and decompress data on the fly (before transferring it to a remote machine for example). This article:
- Gives you a brief overview of data compression
- Describes the
java.util.zip
package - Shows how to use this package to compress and decompress data
- Shows how to compress and decompress serialized objects to save disk space
- Shows how to compress and decompress data on the fly to improve the performance of client/server applications
Overview of Data Compression
The simplest type of redundancy in a file is the repetition of characters. For example, consider the following string:
BBBBHHDDXXXXKKKKWWZZZZ
This string can be encoded more compactly by replacing each repeated string of characters by a single instance of the repeated character and a number that represents the number of times it is repeated. The earlier string can be encoded as follows:
4B2H2D4X4K2W4Z
Here "4B" means four B's, and 2H means two H's, and so on. Compressing a string in this way is called run-length encoding.
As another example, consider the storage of a rectangular image. As a single color bitmapped image, it can be stored as shown in Figure 1.
Figure 1: A bitmap with information for run-length encoding
Another approach might be to store the image as a graphics metafile:
Rectangle 11, 3, 20, 5
This says, the rectangle starts at coordinate (11, 3) of width 20 and length 5 pixels.
The rectangular image can be compressed with run-length encoding by counting identical bits as follows:
0, 400, 400,10 1,20 0,100,10 1,1 0,18 1,1 0,100,10 1,1 0,18 1,1 0,100,10 1,1 0,18 1,1 0,100,10 1,20 0,100,40
The first line above says that the first line of the bitmap consists of 40 0's. The third line says that the third line of the bitmap consists of 10 0's followed by 20 1's followed by 10 more 0's, and so on for the other lines.
Note that run-length encoding requires separate representations for the file and its encoded version. Therefore, this method cannot work for all files. Other compression techniques include variable-length encoding (also known as Huffman Coding), and many others. For more information, there are many books available on data and image compression techniques.
There are many benefits to data compression. The main advantage of it, however, is to reduce storage requirements. Also, for data communications, the transfer of compressed data over a medium results in an increase in the rate of information transfer. Note that data compression can be implemented on existing hardware by software or through the use of special hardware devices that incorporate compression techniques. Figure 2 shows a basic data-compression block diagram.
Figure 2: Data-compression block diagram
ZIP vs. GZIP
If you are working on Windows, you might be familiar with the WinZip tool, which is used to create a compressed archive and to extract files from a compressed archive. On UNIX, however, things are done a bit differently. The tar
command is used to create an archive (not compressed) and another program (gzip
or compress
) is used to compress the archive.
Tools such as WinZip
and PKZIP
act as both an archiver and a compressor. They compress files and store them in an archive. On the other hand,gzip
does not archive files. Therefore, on UNIX, the tar
command is usually used to create an archive then the gzip
command is used to compress the archived file.
The java.util.zip Package
Java provides the java.util.zip
package for zip-compatible data compression. It provides classes that allow you to read, create, and modify ZIP and GZIP file formats. It also provides utility classes for computing checksums of arbitrary input streams that can be used to validate input data. This package provides one interface, fourteen classes, and two exception classes as shown in Table 1.
Checksum
InterfaceRepresents a data checksum. Implemented by the classes Adler32
and CRC32
Adler32
ClassUsed to compute the Adler32 checksum of a data streamCheckedInputStream
ClassAn input stream that maintains the checksum of the data being readCheckedOutputStream
ClassAn output stream that maintains the checksum of the data being writtenCRC32
ClassUsed to compute the CRC32 checksum of a data streamDeflater
ClassSupports general compression using the ZLIB compression libraryDeflaterOutputStream
ClassAn output stream filter for compressing data in the deflate compression formatGZIPInputStream
ClassAn input stream filter for reading compressed data in the GZIP file formatGZIPOutputStream
ClassAn output stream filter for writing compressed data in the GZIP file formatInflater
ClassSupports general decompression using the ZLIB compression libraryInlfaterInputStream
ClassAn input stream filter for decompressing data in the deflate compression formatZipEntry
ClassRepresents a ZIP file entryZipFile
ClassUsed to read entries from a ZIP fileZipInputStream
ClassAn input stream filter for reading files in the ZIP file formatZipOutputStream
ClassAn output stream filter for writing files in the ZIP file formatDataFormatException
Exception ClassThrown to signal a data format errorZipException
Exception ClassThrown to signal a zip error
Note: The ZLIB compression library was initially developed as part of the Portable Network Graphics (PNG) standard that is not protected by patents.
Decompressing and Extracting Data from a ZIP file
The java.util.zip
package provides classes for data compression and decompression. Decompressing a ZIP file is a matter of reading data from an input stream. The java.util.zip
package provides a ZipInputStream
class for reading ZIP files. A ZipInputStream
can be created just like any other input stream. For example, the following segment of code can be used to create an input stream for reading data from a ZIP file format:
FileInputStream fis = new FileInputStream("figs.zip");ZipInputStream zin = new ZipInputStream(new BufferedInputStream(fis));
Once a ZIP input stream is opened, you can read the zip entries using the getNextEntry
method which returns a ZipEntry
object. If the end-of-file is reached, getNextEntry
returns null
:
ZipEntry entry;while((entry = zin.getNextEntry()) != null) { // extract data // open output streams}
Now, it is time to set up a decompressed output stream, which can be done as follows:
int BUFFER = 2048;FileOutputStream fos = new FileOutputStream(entry.getName());BufferedOutputStream dest = new BufferedOutputStream(fos, BUFFER);
Note: In this segment of code we have used theBufferedOutputStream
instead of theZIPOutputStream
. TheZIPOutputStream
and theGZIPOutputStream
use internal buffer sizes of 512. The use of theBufferedOutputStream
is only justified when the size of the buffer is much more than 512 (in this example it is set to 2048). While theZIPOutputStream
doesn't allow you to set the buffer size, in the case of theGZIPOutputStream
however, you can specify the internal buffer size as a constructor argument.
In this segment of code, a file output stream is created using the entry's name, which can be retrieved using the entry.getName
method. Source zipped data is then read and written to the decompressed stream:
while ((count = zin.read(data, 0, BUFFER)) != -1) { //System.out.write(x); dest.write(data, 0, count);}
And finally, close the input and output streams:
dest.flush();dest.close();zin.close();
The source program in Code Sample 1 shows how to decompress and extract files from a ZIP archive. To test this sample, compile the class and run it by passing a compressed file in ZIP format:
prompt> java UnZip somefile.zip
Note that somefile.zip
could be a ZIP archive created using any ZIP-compatible tool, such as WinZip.
Code Sample 1: UnZip.java
import java.io.*;import java.util.zip.*;public class UnZip { final int BUFFER = 2048; public static void main (String argv[]) { try { BufferedOutputStream dest = null; FileInputStream fis = new FileInputStream(argv[0]); ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis)); ZipEntry entry; while((entry = zis.getNextEntry()) != null) { System.out.println("Extracting: " +entry); int count; byte data[] = new byte[BUFFER]; // write the files to the disk FileOutputStream fos = new FileOutputStream(entry.getName()); dest = new BufferedOutputStream(fos, BUFFER); while ((count = zis.read(data, 0, BUFFER)) != -1) { dest.write(data, 0, count); } dest.flush(); dest.close(); } zis.close(); } catch(Exception e) { e.printStackTrace(); } }}
It is important to note that the ZipInputStream
class reads ZIP files sequentially. The class ZipFile
, however, reads the contents of a ZIP file using a random access file internally so that the entries of the ZIP file do not have to be read sequentially.
Note: Another fundamental difference betweenZIPInputStream
andZipFile
is in terms of caching. Zip entries are not cached when the file is read using a combination ofZipInputStream
andFileInputStream
. However, if the file is opened usingZipFile(fileName)
then it is cached internally, so ifZipFile(fileName)
is called again the file is opened only once. The cached value is used on the second open. If you work on UNIX, it is worth noting that all zip files opened usingZipFile
are memory mapped, and therefore the performance ofZipFile
is superior toZipInputStream
. If the contents of the same zip file, however, are be to frequently changed and reloaded during program execution, then usingZipInputStream
is preferred.
This is how a ZIP file can be decompressed using the ZipFile
class:
- Create a
ZipFile
object by specifying the ZIP file to be read either as aString
filename or as aFile
object:ZipFile zipfile = new ZipFile("figs.zip");
- Use the
entries
method, returns anEnumeration
object, to loop through all theZipEntry
objects of the file: - Read the contents of a specific
ZipEntry
within the ZIP file by passing theZipEntry
togetInputStream
, which will return anInputStream
object from which you can read the entry's contents: - Retrieve the entry's filename and create an output stream to save it:
- Finally, close all input and output streams:
while(e.hasMoreElements()) { entry = (ZipEntry) e.nextElement(); // read contents and save them}
is = new BufferedInputStream(zipfile.getInputStream(entry));
byte data[] = new byte[BUFFER];FileOutputStream fos = new FileOutputStream(entry.getName());dest = new BufferedOutputStream(fos, BUFFER);while ((count = is.read(data, 0, BUFFER)) != -1) { dest.write(data, 0, count);}
dest.flush();dest.close();is.close();
The complete source program is shown in Code Sample 2. Again, to test this class, compile it and run it by passing a file in a ZIP format as an argument:
prompt> java UnZip2 somefile.zip
Code Sample 2: UnZip2.java
import java.io.*;import java.util.*;import java.util.zip.*;public class UnZip2 { static final int BUFFER = 2048; public static void main (String argv[]) { try { BufferedOutputStream dest = null; BufferedInputStream is = null; ZipEntry entry; ZipFile zipfile = new ZipFile(argv[0]); Enumeration e = zipfile.entries(); while(e.hasMoreElements()) { entry = (ZipEntry) e.nextElement(); System.out.println("Extracting: " +entry); is = new BufferedInputStream (zipfile.getInputStream(entry)); int count; byte data[] = new byte[BUFFER]; FileOutputStream fos = new FileOutputStream(entry.getName()); dest = new BufferedOutputStream(fos, BUFFER); while ((count = is.read(data, 0, BUFFER)) != -1) { dest.write(data, 0, count); } dest.flush(); dest.close(); is.close(); } } catch(Exception e) { e.printStackTrace(); } }}
Compressing and Archiving Data in a ZIP File
The ZipOutputStream
can be used to compress data to a ZIP file. The ZipOutputStream
writes data to an output stream in a ZIP format. There are a number of steps involved in creating a ZIP file.
- The first step is to create a
ZipOutputStream
object, to which we pass the output stream of the file we wish to write to. Here is how you create a ZIP file entitled "myfigs.zip": - Once the target zip output stream is created, the next step is to open the source data file. In this example, source data files are those files in the current directory. The
list
command is used to get a list of files in the current directory: - Create a zip entry for each file that is read:
- Before you can write data to the ZIP output stream, you must first put the zip entry object using the
putNextEntry
method: - Write the data to the ZIP file:
- Finally, you close the input and output streams:
FileOutputStream dest = new FileOutputStream("myfigs.zip");ZipOutputStream out = new ZipOutputStream(new BufferedOutputStream(dest));
File f = new File(".");String files[] = f.list();for (int i=0; i<files.length; i++) { System.out.println("Adding: "+files[i]); FileInputStream fi = new FileInputStream(files[i]); // create zip entry // add entries to ZIP file}
Note: This code sample is capable of compressing all files in the current directory. It doesn't handle subdirectories. As an exercise, you may want to modify Code Sample 3 to handle subdirectories.
ZipEntry entry = new ZipEntry(files[i]))
out.putNextEntry(entry);
int count;while((count = origin.read(data, 0, BUFFER)) != -1) { out.write(data, 0, count);}
origin.close();out.close();
The complete source program is shown in Code Sample 3.
Code Sample 3: Zip.java
import java.io.*;import java.util.zip.*;public class Zip { static final int BUFFER = 2048; public static void main (String argv[]) { try { BufferedInputStream origin = null; FileOutputStream dest = new FileOutputStream("c://zip//myfigs.zip"); ZipOutputStream out = new ZipOutputStream(new BufferedOutputStream(dest)); //out.setMethod(ZipOutputStream.DEFLATED); byte data[] = new byte[BUFFER]; // get a list of files from current directory File f = new File("."); String files[] = f.list(); for (int i=0; i<files.length; i++) { System.out.println("Adding: "+files[i]); FileInputStream fi = new FileInputStream(files[i]); origin = new BufferedInputStream(fi, BUFFER); ZipEntry entry = new ZipEntry(files[i]); out.putNextEntry(entry); int count; while((count = origin.read(data, 0, BUFFER)) != -1) { out.write(data, 0, count); } origin.close(); } out.close(); } catch(Exception e) { e.printStackTrace(); } }}
Note: Entries can be added to a ZIP file either in a compressed (DEFLATED) or uncompressed (STORED) form. The setMethod
can be used to set the method of storage. For example, to set the method to DEFLATED (compressed) use: out.setMethod(ZipOutputStream.DEFLATED)
and to set it to STORED (not compressed) use: out.setMethod(ZipOutputStream.STORED)
.
ZIP File Properties
The ZipEntry
class describes a compressed file stored in a ZIP file. The various methods contained in this class can be used to set and get pieces of information about the entry. The ZipEntry
class is used by the ZipFile
and ZipInputStream
to read ZIP files, and theZipOutputStream
to write ZIP files. Some of the most useful methods available in the ZipEntry
class are shown, along with a description, in Table 2.
public String getComment()
Returns the comment string for the entry, null if nonepublic long getCompressedSize()
Returns the compressed size of the entry, -1 if not knownpublic int getMethod()
Returns the compression method of the entry, -1 if not specifiedpublic String getName()
Returns the name of the entrypublic long getSize()
Returns the uncompressed zip of the entry, -1 if unknownpublic long getTime()
Returns the modification time of the entry, -1 if not specifiedpublic void setComment(String c)
Sets the optional comment string for the entrypublic void setMethod(int method)
Sets the compression method for the entrypublic void setSize(long size)
Sets the uncompressed size of the entrypublic void setTime(long time)
Sets the modification time of the entryChecksums
Some of the other important classes in the java.util.zip
package are the Adler32
and CRC32
classes, which implement thejava.util.zip.Checksum
interface and compute the checksums required for data compression. The Adler32
algorithm is known to be faster than the CRC32
and it is as reliable. The getValue
method can be used to obtain the current value of the checksum. The reset
method can be used to reset the checksum to its default value.
Checksums can be used to mask corrupted files or messages. For example, suppose you want to create a ZIP file then transfer it to a remote machine. Once it is at the remote machine, using the checksum you can check whether the file got corrupted during the transmission. To demonstrate how to create checksums, we modify Code Sample 1 and Code Sample 3 to use CheckedInputStream
andCheckedOutputStream
as shown in Code Sample 4 and Code Sample 5.
Code Sample 4: Zip.java
import java.io.*;import java.util.zip.*;public class Zip { static final int BUFFER = 2048; public static void main (String argv[]) { try { BufferedInputStream origin = null; FileOutputStream dest = new FileOutputStream("c://zip//myfigs.zip"); CheckedOutputStream checksum = new CheckedOutputStream(dest, new Adler32()); ZipOutputStream out = new ZipOutputStream(new BufferedOutputStream(checksum)); //out.setMethod(ZipOutputStream.DEFLATED); byte data[] = new byte[BUFFER]; // get a list of files from current directory File f = new File("."); String files[] = f.list(); for (int i=0; i<files.length; i++) { System.out.println("Adding: "+files[i]); FileInputStream fi = new FileInputStream(files[i]); origin = new BufferedInputStream(fi, BUFFER); ZipEntry entry = new ZipEntry(files[i]); out.putNextEntry(entry); int count; while((count = origin.read(data, 0, BUFFER)) != -1) { out.write(data, 0, count); } origin.close(); } out.close(); System.out.println("checksum: "+checksum.getChecksum().getValue()); } catch(Exception e) { e.printStackTrace(); } }}
Code Sample 5: UnZip.java
import java.io.*;import java.util.zip.*;public class UnZip { public static void main (String argv[]) { try { final int BUFFER = 2048; BufferedOutputStream dest = null; FileInputStream fis = new FileInputStream(argv[0]); CheckedInputStream checksum = new CheckedInputStream(fis, new Adler32()); ZipInputStream zis = new ZipInputStream(new BufferedInputStream(checksum)); ZipEntry entry; while((entry = zis.getNextEntry()) != null) { System.out.println("Extracting: " +entry); int count; byte data[] = new byte[BUFFER]; // write the files to the disk FileOutputStream fos = new FileOutputStream(entry.getName()); dest = new BufferedOutputStream(fos, BUFFER); while ((count = zis.read(data, 0, BUFFER)) != -1) { dest.write(data, 0, count); } dest.flush(); dest.close(); } zis.close(); System.out.println("Checksum: "+checksum.getChecksum().getValue()); } catch(Exception e) { e.printStackTrace(); } }}
To test Code Sample 4 and 5, compile the classes and then run the Zip
class to create a ZIP archive (a checksum value will be calculated and printed on the screen for your information) and then run the UnZip
class to decompress the archive (a checksum value will be printed on the console). The two values must be exactly the same, otherwise the file is corrupted. Checksums are very useful in validating data. For example, you can create a ZIP file and send it to your friend along with a checksum. Your friend unzips the file and compares the checksum with the one you provided, if they are the same your friend knows that the file is authentic.
Compressing Objects
We have seen how to compress data available in file form and add it to an archive. But what if the data you wish to compress is not available in a file? Assume for example, that you are transferring large objects over sockets. To improve the performance of your application, you may want to compress the objects before sending them across the network and uncompress them at the destination. As another example, let's say you want to save objects on the disk in compressed format. The ZIP
format, which is record-based, is not really suitable for this job. The GZIP is more appropriate as it operates on a single stream of data.
Now, let's see an example of how to compress objects before writing them on disk and how to decompress them after reading them from the disk. Code Sample 6 is a simple class that implements the Serializable
interface to signal the JVM1 that we wish to serialize instances of this class.
Code Sample 6: Employee.java
import java.io.*;public class Employee implements Serializable { String name; int age; int salary; public Employee(String name, int age, int salary) { this.name = name; this.age = age; this.salary = salary; } public void print() { System.out.println("Record for: "+name); System.out.println("Name: "+name); System.out.println("Age: "+age); System.out.println("Salary: "+salary); }}
Now, write another class that creates a couple of objects from the Employee
class. Code Sample 7 creates two objects (sarah
and sam
) of theEmployee
class, then saves their state in a file in a compressed format.
Code Sample 7 SaveEmployee.java
import java.io.*;import java.util.zip.*;public class SaveEmployee { public static void main(String argv[]) throws Exception { // create some objects Employee sarah = new Employee("S. Jordan", 28, 56000); Employee sam = new Employee("S. McDonald", 29, 58000); // serialize the objects sarah and sam FileOutputStream fos = new FileOutputStream("db"); GZIPOutputStream gz = new GZIPOutputStream(fos); ObjectOutputStream oos = new ObjectOutputStream(gz); oos.writeObject(sarah); oos.writeObject(sam); oos.flush(); oos.close(); fos.close(); }}
Now, the ReadEmployee
class shown in Code Sample 8 is used to reconstruct the state of the two objects. Once the state has been constructed the print
method is invoked on them.
Code Sample 8: ReadEmployee.java
import java.io.*;import java.util.zip.*;public class ReadEmployee { public static void main(String argv[]) throws Exception{ //deserialize objects sarah and sam FileInputStream fis = new FileInputStream("db"); GZIPInputStream gs = new GZIPInputStream(fis); ObjectInputStream ois = new ObjectInputStream(gs); Employee sarah = (Employee) ois.readObject(); Employee sam = (Employee) ois.readObject(); //print the records after reconstruction of state sarah.print(); sam.print(); ois.close(); fis.close(); }}
The same idea can be used to compress large objects that are sent over sockets. The following segment of code show how to write objects in a compressed format, from the server to the client:
// write to clientGZIPOutputStream gzipout = new GZIPOutputStream(socket.getOutputStream());ObjectOutputStream oos = new ObjectOutputStream(gzipout);oos.writeObject(obj);gzipos.finish();
And, the following segment of code shows how to decompress the objects at the client side once received from the server:
// read from serverSocket socket = new Socket(remoteServerIP, PORT);GZIPInputStream gzipin = new GZIPInputStream(socket.getInputStream());ObjectInputStream ois = new ObjectInputStream(gzipin);Object o = ois.readObject();
What about JAR Files?
The Java ARchive (JAR) format is based on the standard ZIP file format with an optional manifest file. If you wish to create JAR files or extract files from a JAR file from within your Java applications, use the java.util.jar
package, which provides classes for reading and writing JAR files. Using the classes provided by the java.util.jar
package is very similar to using the classes provided by the java.util.zip
package as described in this article. Therefore, you should be able to adapt much of the code in this article if you wish to use the java.util.jar
package.
Conclusion
This article discussed the APIs that you can use to compress and decompress data from within your applications, with code samples throughout the article to show how to use the java.util.zip
package to compress and decompress data. Now you have the tools to utilize data compression and decompression in your applications.
The article also shows how to compress and decompress data on the fly in order to reduce network traffic and improve the performance of your client/server applications. Compressing data on the fly, however, improves the performance of client/server applications only when the objects being compressed are more than a couple of hundred bytes. You would not be able to observe improvement in performance if the objects being compressed and transferred are simple String
objects, for example.
For more information
- The java.util.zip Package
- The java.util.jar Package
- Object Serialization
- Transporting Objects over Sockets
About the Author
Qusay H. Mahmoud provides Java consulting and training services. He has published dozens of articles on Java, and is the author of Distributed Programming with Java (Manning Publications, 1999) and Learning Wireless Java (O'Reilly, 2002).
1 As used on this web site, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.
- zip压缩以及解压 Compressing and Decompressing Data Using Java APIs
- zip压缩以及解压 Compressing and Decompressing Data Using Java APIs
- Compressing and Decompressing Data using JavaTM APIs
- JAVA压缩/解压ZIP
- Java压缩/解压ZIP
- java zip压缩解压
- java,zip压缩,解压。
- Java Zip 压缩、解压
- Java 文件压缩和解压 Zip and Unzip
- java 压缩以及解压文件,tar,zip,gz(gizp)
- 解压zip压缩包(JAVA)
- java压缩和解压zip
- java解压、压缩、操作ZIP
- Java ZIp 压缩和解压
- java解压Zip压缩包
- java zip 压缩与解压
- java算法 -- zip解压,压缩
- java zip压缩和解压
- 编译原理(三)语法分析
- 菜鸟拿站关键词
- Android调用平台功能具体技巧分享
- Web测试
- zip压缩以及解压 Compressing and Decompressing Data Using Java APIs
- zip压缩以及解压 Compressing and Decompressing Data Using Java APIs
- 《完美软件》读书笔记11:信息摄取
- 使用volatile关键字的原因
- zip file unzip file demo
- 笔记本共享上网(A有线,B通过无线与A相连)
- 用jquery.validate.js验证表单
- 关于ARM的中断服务程序
- 创建模式:Factory Method工厂模式
- WinXP下搭建virtualbox+ubuntu10.04+LAMP+tomcat6环境小记