ZIP文件格式详解

来源:互联网 发布:淘宝图片水印怎么设置 编辑:程序博客网 时间:2024/03/28 19:12

ZIP文件格式详解 

一个 ZIP 文件的普通格式---------------------- 

 

一个 ZIP 文件由三个部分组成: 

 

压缩源文件数据区+压缩源文件目录区+压缩源文件目录结束标志 

 

1、压缩源文件数据区 

 

在这个数据区中每一个压缩的源文件/目录都是一条记录,记录的格式如下: 

 

[文件头+ 文件数据 + 数据描述符] 

 

a、文件头结构 

 

组成   长度 

文件头标记 4 bytes (0x04034b50) 

解压文件所需 pkware 版本 2 bytes 

全局方式位标记 2 bytes 

  压缩方式 2 bytes 

  最后修改文件时间 2 bytes 

   最后修改文件日期 2 bytes 

   CRC-32校验 4 bytes 

   压缩后尺寸 4 bytes 

   未压缩尺寸 4 bytes 

   文件名长度 2 bytes 

 

扩展记录长度 2 bytes 

   文件名 (不定长度) 

   扩展字段 (不定长度) 

 

b、文件数据 

 

c、数据描述符 

 

   组成  长度 

  CRC-32校验 4 bytes 

  压缩后尺寸 4 bytes 

   未压缩尺寸 4 bytes 

 

这个数据描述符只在全局方式位标记的第3位设为1时才存在(见后详解),紧接在压缩数据的最后一个字节后。这个数据描述符只用在不能对输出的 ZIP 文件进行检索时使用。例如:在一个不能检索的驱动器(如:磁带机上)上的 ZIP 文件中。如果是磁盘上的ZIP文件一般没有这个数据描述符。 

 

2、压缩源文件目录区 

 

在这个数据区中每一条纪录对应在压缩源文件数据区中的一条数据 

 

   组成   长度 

  目录中文件文件头标记 4 bytes (0x02014b50) 

  压缩使用的 pkware 版本 2 bytes 

  解压文件所需 pkware 版本 2 bytes 

  全局方式位标记 2 bytes 

  压缩方式 2 bytes 

  最后修改文件时间 2 bytes 

  最后修改文件日期 2 bytes 

  CRC-32校验 4 bytes 

  压缩后尺寸 4 bytes 

  未压缩尺寸 4 bytes 

  文件名长度 2 bytes 

  扩展字段长度 2 bytes 

  文件注释长度 2 bytes 

  磁盘开始号 2 bytes 

  内部文件属性 2 bytes 

  外部文件属性 4 bytes 

局部头部偏移量 4 bytes 

  文件名 (不定长度) 

  扩展字段 (不定长度) 

文件注释 (不定长度) 

 

3、压缩源文件目录结束标志 

 

   组成   长度 

目录结束标记 4 bytes (0x02014b50) 

当前磁盘编号 2 bytes 

目录区开始磁盘编号 2 bytes 

  本磁盘上纪录总数 2 bytes 

  目录区中纪录总数 2 bytes 

  目录区尺寸大小 4 bytes 

  目录区对第一张磁盘的偏移量 4 bytes 

  ZIP 文件注释长度 2 bytes 

 

  ZIP 文件注释 (不定长度) 

##################################################

explanation of fields: 

version made by (2 bytes) 

the upper byte indicates the compatibility of the file 

attribute information. if the external file attributes 

are compatible with ms-dos and can be read by pkzip for 

dos version 2.04g then this value will be zero. if these 

attributes are not compatible, then this value will identify 

the host system on which the attributes are compatible. 

software can use this information to determine the line 

record format for text files etc. the current 

mappings are: 

0 - ms-dos and os/2 (fat / vfat / fat32 file systems) 

1 - amiga 2 - vax/vms 

3 - unix 4 - vm/cms 

5 - atari st 6 - os/2 h.p.f.s. 

7 - macintosh 8 - z-system 

9 - cp/m 10 - windows ntfs 

11 thru 255 - unused 

the lower byte indicates the version number of the 

software used to encode the file. the value/10 

indicates the major version number, and the value 

mod 10 is the minor version number. 

version needed to extract (2 bytes) 

the minimum software version needed to extract the 

file, mapped as above. 

general purpose bit flag: (2 bytes) 

bit 0: if set, indicates that the file is encrypted. 

(for method 6 - imploding) 

bit 1: if the compression method used was type 6, 

imploding, then this bit, if set, indicates 

an 8k sliding dictionary was used. if clear, 

then a 4k sliding dictionary was used. 

bit 2: if the compression method used was type 6, 

imploding, then this bit, if set, indicates 

an 3 shannon-fano trees were used to encode the 

sliding dictionary output. if clear, then 2 

shannon-fano trees were used. 

(for method 8 - deflating) 

bit 2 bit 1 

0 0 normal (-en) compression option was used. 

0 1 maximum (-ex) compression option was used. 

1 0 fast (-ef) compression option was used. 

1 1 super fast (-es) compression option was used. 

note: bits 1 and 2 are undefined if the compression 

method is any other. 

bit 3: if this bit is set, the fields crc-32, compressed size 

and uncompressed size are set to zero in the local 

header. the correct values are put in the data descriptor 

immediately following the compressed data. (note: pkzip 

version 2.04g for dos only recognizes this bit for method 8 

compression, newer versions of pkzip recognize this bit 

for any compression method.) 

the upper three bits are reserved and used internally 

by the software when processing the zipfile. the 

remaining bits are unused. 

compression method: (2 bytes) 

(see accompanying documentation for algorithm 

descriptions) 

0 - the file is stored (no compression) 

1 - the file is shrunk 

2 - the file is reduced with compression factor 1 

3 - the file is reduced with compression factor 2 

4 - the file is reduced with compression factor 3 

5 - the file is reduced with compression factor 4 

6 - the file is imploded 

7 - reserved for tokenizing compression algorithm 

8 - the file is deflated 

9 - reserved for enhanced deflating 

10 - pkware date compression library imploding 

date and time fields: (2 bytes each) 

the date and time are encoded in standard ms-dos format. 

if input came from standard input, the date and time are 

those at which compression was started for this data. 

crc-32: (4 bytes) 

the crc-32 algorithm was generously contributed by 

david schwaderer and can be found in his excellent 

book "c programmers guide to netbios" published by 

howard w. sams & co. inc. the 'magic number' for 

the crc is 0xdebb20e3. the proper crc pre and post 

conditioning is used, meaning that the crc register 

is pre-conditioned with all ones (a starting value 

of 0xffffffff) and the value is post-conditioned by 

taking the one's complement of the crc residual. 

if bit 3 of the general purpose flag is set, this 

field is set to zero in the local header and the correct 

value is put in the data descriptor and in the central 

directory. 

compressed size: (4 bytes) 

uncompressed size: (4 bytes) 

the size of the file compressed and uncompressed, 

respectively. if bit 3 of the general purpose bit flag 

is set, these fields are set to zero in the local header 

and the correct values are put in the data descriptor and 

in the central directory. 

filename length: (2 bytes) 

extra field length: (2 bytes) 

file comment length: (2 bytes) 

the length of the filename, extra field, and comment 

fields respectively. the combined length of any 

directory record and these three fields should not 

generally exceed 65,535 bytes. if input came from standard 

input, the filename length is set to zero. 

disk number start: (2 bytes) 

the number of the disk on which this file begins. 

internal file attributes: (2 bytes) 

the lowest bit of this field indicates, if set, that 

the file is apparently an ascii or text file. if not 

set, that the file apparently contains binary data. 

the remaining bits are unused in version 1.0. 

external file attributes: (4 bytes) 

the mapping of the external attributes is 

host-system dependent (see 'version made by'). for 

ms-dos, the low order byte is the ms-dos directory 

attribute byte. if input came from standard input, this 

field is set to zero. 

relative offset of local header: (4 bytes) 

this is the offset from the start of the first disk on 

which this file appears, to where the local header should 

be found. 

filename: (variable) 

the name of the file, with optional relative path. 

the path stored should not contain a drive or 

device letter, or a leading slash. all slashes 

should be forward slashes '/' as opposed to 

backwards slashes '/' for compatibility with amiga 

and unix file systems etc. if input came from standard 

input, there is no filename field. 

extra field: (variable) 

this is for future expansion. if additional information 

needs to be stored in the future, it should be stored 

here. earlier versions of the software can then safely 

skip this file, and find the next file or header. this 

field will be 0 length in version 1.0. 

原创粉丝点击