实验五——JPEG编解码

来源:互联网 发布:淘宝举报盗图原图太大 编辑:程序博客网 时间:2024/06/01 10:49

一.实验原理

(1)JPEG的文件格式

SOI,Start of Image,图像开始,固定值0xFFD8 
APP0,Application,应用程序保留标记0,固定值0xFFE0
9个具体字段:
  ① 数据长度  2字节  ①~⑨9个字段的总长度
  ②标识符  5字节   固定值0x4A46494600,即字符串“JFIF0”
  版本号  2字节   一般是0x0102,表示JFIF的版本号1.2
  ④ XY的密度单位  1字节  只有三个值可选
  0:无单位;1:点数/英寸;2:点数/厘米
  X方向像素密度      2字节  取值范围未知
  Y方向像素密度         2字节  取值范围未知  
  ⑦ 缩略图水平像素数目  1字节  取值范围未知
  ⑧ 缩略图垂直像素数目  1字节  取值范围未知
  ⑨ 缩略图RGB位图        长度可能是3的倍数  缩略图RGB位图数据
 
DQT,Define Quantization Table,定义量化表,固定值0xFFDB
n包含9个具体字段:
  ① 数据长度  2字节  字段①和多个字段②的总长度
  ② 量化表    数据长度-2字节   
         a)精度及量化表ID  1字节 

           高4位:精度,只有两个可选值     0:8位;1:16位
      低4位:量化表ID,取值范围为0~3

         b)表项 (64×(精度+1))字节

         例如8位精度的量化表,其表项长度为64×(0+1)=64字节

       本标记段中,字段②可以重复出现,表示多个量化表,但最多只能出现4次

SOF0,Start of Frame,帧图像开始,固定值0xFFC0
9个具体字段:
  ① 数据长度  2字节  ①~⑥六个字段的总长度
  ②精度           1字节  每个数据样本的位数
    通常是8位,一般软件都不支持 12位和16位
  ③ 图像高度  2字节  图像高度(单位:像素)
  ④图像宽度  2字节  图像宽度(单位:像素)
  ⑤颜色分量数  1字节  只有3个数值可选
   1:灰度图;3:YCrCb或YIQ;4:CMYK
   而JFIF中使用YCrCb,故这里颜色分量数恒为3
  颜色分量信息  颜色分量数×3字节(通常为9字节)                                        
     a)颜色分量ID  1字节 

     b)水平/垂直采样因子  1字节 

           高4位:水平采样因子
           低4位:垂直采样因子
     c)量化表  1字节   当前分量使用的量化表的ID
DHT,Define Huffman Table,定义哈夫曼表,固定值0xFFC4
2个具体字段

       ① 数据长度      2字节 

       ② huffman  数据长度-2字节

       ID和表类型  1字节 

            高4位:类型,只有两个值可选

              0:DC直流;1:AC交流
       低4位:哈夫曼表ID

            注意,DC表和AC表分开编码

       不同位数的码字数量  16字节

  编码内容  16个不同位数的码字数量之和(字节)
本标记段中,字段②可以重复出现(一般4次),也可以只出现1次
SOS,Start of Scan,扫描开始12字节,固定值0xFFDA 
2个具体字段:
 ①数据长度           2字节  ①~④两个字段的总长度
 ②颜色分量数         1字节  应该和SOF中的字段⑤的值相同,即:
     1:灰度图是;3: YCrCb或YIQ;4:CMYK。
 ③颜色分量信息
    a)颜色分量ID   1字节
    b) 直流/交流系数表号  1字节 

       高4位:直流分量使用的哈夫曼树编号
       低4位:交流分量使用的哈夫曼树编号

 ④压缩图像数据

       a)谱选择开始  1字节  固定值0x00
       b)谱选择结束      1字节  固定值0x3F
       c)谱选择  1字节  在基本JPEG中总为00

EOI,End of Image,图像结束2字节,固定值0xFFD9 
(2)huffman表存储方式的说明
在标记段DHT内,包含了一个或者多个的哈夫曼表。对于单一个哈夫曼表,应该包括了三部分:
(1)哈夫曼表ID和表类型
这个字节的值为一般只有四个0x00、0x01、0x10、0x11。
        0x00表示DC直流0号表;
        0x01表示DC直流1号表;
        0x10表示AC交流0号表;
        0x11表示AC交流1号表。
(2)不同位数的码字数量
(3)编码内容
JPEG文件的哈夫曼编码只能是1~16位。这个字段的16个字节分别表示1~16位的编码码字在哈夫曼树中的个数。
这个字段记录了哈夫曼树中各个叶子结点的权。所以,上一字段(不同位数的码字数量)的16个数值之和就应该是本字段的长度,也就是哈夫曼树中叶子结点个数。
(3)建立Huffman表
在读出哈夫曼表的数据后,就要建立哈夫曼树。具体方法为:

1)第一个码字必定为0
如果第一个码字位数为1,则码字为0;
如果第一个码字位数为2,则码字为00;
如此类推。

2)从第二个码字开始,
如果它和它前面的码字位数相同,则当前码字为它前面的码字加1;

如果它的位数比它前面的码字位数大,则当前码字是前面的码字加1后再在后边添若干个0,直至满足位数长度为止。

直流系数解码:

其权值就是解码时再需要读入的bit位数。这个再次读入的位数通过查表得到真正的码值。

   例如:0110101011

根据刚建立的huffman表分解:01,10101011.

码字01的权值为05.则再读取5位。01,10101011

这5位10101进行译码为:21. 表示直流系数为21.

注意:直流系数是差分编码

交流系数解码:

对于交流系数,用交流哈夫曼树/表查得该码字对应的权值。权值的4表示当前数值前面有多少个连续的零,4表示该交流分量数值的二进制位数,也就是接下来需要读入的位数。

例如:权值为0X31.可表示为(31)。表明此交流系数前有3个0,而此交流系数的具体值还需要再读入1个bit的码字,才能得到。

Huffman的直流分量为DPCM之后在进行Huffman编码,而交流分量是游程编码之后在进行Huffman编码。
二.实验方框图



三.实验代码分析:

主函数主要代码分析:

static void build_quantization_table(float *qtable, const unsigned char *ref_table)//量化矩阵输出 {    int i, j;    static const double aanscalefactor[8] = {       1.0, 1.387039845, 1.306562965, 1.175875602,       1.0, 0.785694958, 0.541196100, 0.275899379    };    const unsigned char *zz = zigzag;    for (i = 0; i < 8; i++) {        for (j = 0; j < 8; j++) {      #if YTXT                fprintf(p_ytxt, "%d ", ref_table[*zz]);//输出以zigzag矩阵所规定的顺序的量化矩阵               fflush(p_ytxt);      #endif            *qtable++ = ref_table[*zz++] * aanscalefactor[i] * aanscalefactor[j];            }           #if YTXT                fprintf(p_ytxt, "\n");                fflush(p_ytxt);           #endif      }    }//之字形扫描矩阵输出static const unsigned char zigzag[64] =   {     0,  1,  5,  6, 14, 15, 27, 28,     2,  4,  7, 13, 16, 26, 29, 42,     3,  8, 12, 17, 25, 30, 41, 43,     9, 11, 18, 24, 31, 40, 44, 53,    10, 19, 23, 32, 39, 45, 52, 54,    20, 22, 33, 38, 46, 51, 55, 60,    21, 34, 37, 47, 50, 56, 59, 61,    35, 36, 48, 49, 57, 58, 62, 63  }//直流以及交流系数的Huffman码表输出static int parse_DHT(struct jdec_private *priv, const unsigned char *stream)  {    unsigned int count, i;    unsigned char huff_bits[17];    int length, index;      length = be16_to_cpu(stream) - 2;    stream += 2;  /* Skip length */  #if TRACE    fprintf(p_trace,"> DHT marker (length=%d)\n", length);    fflush(p_trace);  #endif  #if YTXT    fprintf(p_ytxt, "> DHT marker (length=%d)\n", length);    fflush(p_ytxt);  #endif    while (length>0) {       index = *stream++;         /* We need to calculate the number of bytes 'vals' will takes */       huff_bits[0] = 0;       count = 0;       for (i=1; i<17; i++) {      huff_bits[i] = *stream++;      count += huff_bits[i];       }  #if SANITY_CHECK       if (count >= HUFFMAN_BITS_SIZE)         snprintf(error_string, sizeof(error_string),"No more than %d bytes is allowed to describe a huffman table", HUFFMAN_BITS_SIZE);       if ( (index &0xf) >= HUFFMAN_TABLES)         snprintf(error_string, sizeof(error_string),"No more than %d Huffman tables is supported (got %d)\n", HUFFMAN_TABLES, index&0xf);  #if TRACE       fprintf(p_trace,"Huffman table %s[%d] length=%d\n", (index&0xf0)?"AC":"DC", index&0xf, count);       fflush(p_trace);  #endif  #if YTXT       fprintf(p_ytxt, "Huffman table %s[%d] length=%d\n", (index & 0xf0) ? "AC" : "DC", index & 0xf, count);       fflush(p_ytxt);  #endif  #endif         if (index & 0xf0 )         build_huffman_table(huff_bits, stream, &priv->HTAC[index&0xf]);       else         build_huffman_table(huff_bits, stream, &priv->HTDC[index&0xf]);         length -= 1;       length -= 16;       length -= count;       stream += count;    }  static void build_huffman_table(const unsigned char *bits, const unsigned char *vals, struct huffman_table *table)  {    unsigned int i, j, code, code_size, val, nbits;    unsigned char huffsize[HUFFMAN_BITS_SIZE+1], *hz;    unsigned int huffcode[HUFFMAN_BITS_SIZE+1], *hc;    int next_free_entry;      /*    * Build a temp array     *   huffsize[X] => numbers of bits to write vals[X]    */    hz = huffsize;    for (i=1; i<=16; i++)     {       for (j=1; j<=bits[i]; j++)         *hz++ = i;     }    *hz = 0;      memset(table->lookup, 0xff, sizeof(table->lookup));    for (i=0; i<(16-HUFFMAN_HASH_NBITS); i++)      table->slowtable[i][0] = 0;      /* Build a temp array    *   huffcode[X] => code used to write vals[X]    */    code = 0;    hc = huffcode;    hz = huffsize;    nbits = *hz;    while (*hz)     {       while (*hz == nbits)        {      *hc++ = code++;      hz++;        }       code <<= 1;       nbits++;     }      /*    * Build the lookup table, and the slowtable if needed.    */    next_free_entry = -1;    for (i=0; huffsize[i]; i++)     {       val = vals[i];       code = huffcode[i];       code_size = huffsize[i];      #if TRACE       fprintf(p_trace,"val=%2.2x code=%8.8x codesize=%2.2d\n", val, code, code_size);       fflush(p_trace);      #endif      #if YTXT           fprintf(p_ytxt, "val=%2.2x code=%8.8x codesize=%2.2d\n", val, code, code_size);           fflush(p_ytxt);      #endif       table->code_size[val] = code_size;       if (code_size <= HUFFMAN_HASH_NBITS)        {      /*      * Good: val can be put in the lookup table, so fill all value of this      * column with value val       */      int repeat = 1UL<<(HUFFMAN_HASH_NBITS - code_size);      code <<= HUFFMAN_HASH_NBITS - code_size;      while ( repeat-- )        table->lookup[code++] = val;          }       else        {      /* Perhaps sorting the array will be an optimization */      uint16_t *slowtable = table->slowtable[code_size-HUFFMAN_HASH_NBITS-1];      while(slowtable[0])        slowtable+=2;      slowtable[0] = code;      slowtable[1] = val;      slowtable[2] = 0;      /* TODO: NEED TO CHECK FOR AN OVERFLOW OF THE TABLE */        }       }  }  

typedef struct huffman_node_tag{unsigned char isLeaf;//是否为叶结点unsigned long count;//信源出现频数struct huffman_node_tag *parent;//父结点(结构体)指针//如果不是树叶,此项为左,右结点的指针,否则为某个信源符号/*union:与struct相似,维护足够的空间来放置多个数据成员的一种,同一时间只能存储其中的一个数据成员,而不是为每一个数据成员都配置空间*/union{struct{struct huffman_node_tag *zero, *one;//zero:左结点,one:右结点 指针};unsigned char symbol;//信源符号};} huffman_node;//huffman 码字结点typedef struct huffman_code_tag{/* 码字长度numbits,用来记录从leaf--->root一共走了多少步. 及叶子节点对应字符的编码长度*/unsigned long numbits;/* bits用来存储编码,以Byte为单位,而编码是以bit为单位的, 所以需要根据 numbits 去从 bits 里提取出前numbits 个bit。 numbits 和 bits是有关系的, bits一定不可能超过 numbits/8 */unsigned char *bits;//位} huffman_code;//huffman statistics huffman编码的统计结果typedef struct huffman_statistics_result{float freq[256];unsigned long numbits[256];unsigned char bits[256][100];}huffman_stat;//bit向byte转换,不足8位则补0static unsigned long numbytes_from_numbits(unsigned long numbits){return numbits / 8 + (numbits % 8 ? 1 : 0);}/* get_bit returns the ith bit in the bits arrayin the 0th position of the return value.取出第i位,从低位向高位排序 */static unsigned char get_bit(unsigned char* bits, unsigned long i){return (bits[i / 8] >> i % 8) & 1;}//反转static void reverse_bits(unsigned char* bits, unsigned long numbits){unsigned long numbytes = numbytes_from_numbits(numbits);//将bits转换成bytesunsigned char *tmp =(unsigned char*)alloca(numbytes);//在栈(stack)上申请空间,用完马上就释放.unsigned long curbit;//当前要进行反转的位 位置indexlong curbyte = 0;//当前字节(要进行反转的位所在的数组的下标)memset(tmp, 0, numbytes);// 将tmp指向的空间清零for(curbit = 0; curbit < numbits; ++curbit){unsigned int bitpos = curbit % 8;//当前byte里的位置index////如果已反转的码字位数达到8或8的整数倍,就进行下一个byteif(curbit > 0 && curbit % 8 == 0)++curbyte;//按位或tmp[curbyte] |= (get_bit(bits, numbits - curbit - 1) << bitpos);}memcpy(bits, tmp, numbytes);//将tmp里的已反转的数据拷贝到bits中}/* * new_code builds a huffman_code from a leaf in * a Huffman tree. * 对指定的叶子进行编码*/#define MAX_SYMBOLS 256typedef huffman_node* SymbolFrequencies[MAX_SYMBOLS];//数组元素是结点指针typedef huffman_code* SymbolEncoder[MAX_SYMBOLS];//数组元素是码字结点指针//创建一个孤立的"叶子"节点, static huffman_node*new_leaf_node(unsigned char symbol){huffman_node *p = (huffman_node*)malloc(sizeof(huffman_node));p->isLeaf = 1;//1代表是叶子结点p->symbol = symbol;// symbol : 该叶子节点表示的字符p->count = 0;p->parent = 0;return p;}//创建一个非叶子结点的结点static huffman_node*new_nonleaf_node(unsigned long count, huffman_node *zero, huffman_node *one){huffman_node *p = (huffman_node*)malloc(sizeof(huffman_node));p->isLeaf = 0;p->count = count;//字符出现的频数p->zero = zero;//左结点p->one = one;//右结点p->parent = 0;return p;//返回一个结点指针}//释放树占用的内存空间static voidfree_huffman_tree(huffman_node *subtree){if(subtree == NULL)return;//先序遍历进行递归调用if(!subtree->isLeaf){free_huffman_tree(subtree->zero);free_huffman_tree(subtree->one);}free(subtree);}//释放结点static voidfree_code(huffman_code* p){free(p->bits);free(p);}static voidfree_encoder(SymbolEncoder *pSE){unsigned long i;for(i = 0; i < MAX_SYMBOLS; ++i){huffman_code *p = (*pSE)[i];if(p)free_code(p);}free(pSE);}//初始化字符频率static voidinit_frequencies(SymbolFrequencies *pSF){memset(*pSF, 0, sizeof(SymbolFrequencies));//清零#if 0unsigned int i;for(i = 0; i < MAX_SYMBOLS; ++i){unsigned char uc = (unsigned char)i;(*pSF)[i] = new_leaf_node(uc);}#endif}typedef struct buf_cache_tag{//一个临时存储数据的buffer, cache会将数据写往pbufout区间unsigned char *cache;//指向真正存储数据的bufferunsigned int cache_len;// cache的长度, 初始时设置cache的大小unsigned int cache_cur;//目前已缓存在cache中的数据大小//pbufout类似一个仓库, 会一直存储cache写入的数据,//cache可以多次往pbufout内写数据, pbufout会一直保存这些数据.//pbufout是一个动态的buffer, cache每一次往pbufout内写数据的时候pbufout都需要realloc一次unsigned char **pbufout;//二级指针,指向输出数据的存储区域 unsigned int *pbufoutlen;//内存的大小} buf_cache;static int init_cache(buf_cache* pc,  unsigned int cache_size,  unsigned char **pbufout,  unsigned int *pbufoutlen){assert(pc && pbufout && pbufoutlen);if(!pbufout || !pbufoutlen)//一旦没有分配内存或内存空间为0,就结束return 1;pc->cache = (unsigned char*)malloc(cache_size);//给cache分配内存空间pc->cache_len = cache_size;pc->cache_cur = 0;//从0开始向cache中存储数据pc->pbufout = pbufout;*pbufout = NULL;//这个地方是一级指针pc->pbufoutlen = pbufoutlen;*pbufoutlen = 0;return pc->cache ? 0 : 1;//成功分配内存返回0,失败则返回1}//释放buffer cachestatic void free_cache(buf_cache* pc){assert(pc);if(pc->cache){free(pc->cache);pc->cache = NULL;}}// 将cache内的数据写到pbufout中去, 并清洗cache//成功则返回0, 失败返回1static int flush_cache(buf_cache* pc){assert(pc);if(pc->cache_cur > 0)//确定cache_cur有平移,也即确定cache内有数据. 然后才可以flush{unsigned int newlen = pc->cache_cur + *pc->pbufoutlen;//需要重新为*(pc->pbufout)分配空间, tmp为这个新buffer的首地址unsigned char* tmp = (unsigned char*)realloc(*pc->pbufout, newlen);if(!tmp)return 1;//拷贝到pbufout处,数据长度为cache_curmemcpy(tmp + *pc->pbufoutlen, pc->cache, pc->cache_cur);*pc->pbufout = tmp;//pbufout指针重定位到新的扩大了的内存区*pc->pbufoutlen = newlen;//重新计算pbufout的大小pc->cache_cur = 0;//cache逻辑上清零}return 0;}//写cache static int write_cache(buf_cache* pc,   const void *to_write,   unsigned int to_write_len){unsigned char* tmp;assert(pc && to_write);assert(pc->cache_len >= pc->cache_cur);/* If trying to write more than the cache will hold * flush the cache and allocate enough space immediately, * that is, don't use the cache. *///to_write_len:需要往cache内写的数据长度//pc->cache_len-pc->cache_cur : cache内剩余空间可容纳的数据长度//大于说明cache的容纳能力不够,需要先清洗cache,将数据写到pbufout中,而不使用cacheif(to_write_len > pc->cache_len - pc->cache_cur){unsigned int newlen;flush_cache(pc);//清洗cache//重新定义要开辟的pbufout内存的大小。因为数据要直接写入pbufout中而不是写入cache中newlen = *pc->pbufoutlen + to_write_len;tmp =(unsigned char *) realloc(*pc->pbufout, newlen);if(!tmp)return 1;//把要写的数据写入pbufout中memcpy(tmp + *pc->pbufoutlen, to_write, to_write_len);*pc->pbufout = tmp;//pbufout指针重定位到新的内存区*pc->pbufoutlen = newlen;//新内存区域的大小}//说明cache存储能力足够,则往cache内追加数据else{///* Write the data to the cache. */memcpy(pc->cache + pc->cache_cur, to_write, to_write_len);pc->cache_cur += to_write_len;//已用内存空间要加上写入的数据}return 0;}//表示字符的UC是unsigned char型,1字节//扫描FILE对象,计算FILE对象内的各个字符出现的频率static unsigned intget_symbol_frequencies(SymbolFrequencies *pSF, FILE *in){int c;unsigned int total_count = 0;// FILE对象内的字符总数/* 初始化频率为0. */init_frequencies(pSF);/* Count the frequency of each symbol in the input file. */while((c = fgetc(in)) != EOF)//扫描输入文件,EOF:即-1,表示文件结束{unsigned char uc = c;//(*pSF)[uc]表示一个字符uc出现的频次 如果这个字符没有出现过则为这个字符建立一个叶子if(!(*pSF)[uc])(*pSF)[uc] = new_leaf_node(uc);//uc字符huffman_node的count自加++(*pSF)[uc]->count;++total_count;}return total_count;//返回值是字符的总数}//计算buffer内各个字符的频率,和get_symbol_frequencies函数同理 static unsigned intget_symbol_frequencies_from_memory(SymbolFrequencies *pSF,   const unsigned char *bufin,   unsigned int bufinlen){unsigned int i;unsigned int total_count = 0;/* Set all frequencies to 0. */init_frequencies(pSF);//初始化所有频率为0/* Count the frequency of each symbol in the input file. */for(i = 0; i < bufinlen; ++i)//{unsigned char uc = bufin[i];if(!(*pSF)[uc])(*pSF)[uc] = new_leaf_node(uc);++(*pSF)[uc]->count;++total_count;}return total_count;}/* * When used by qsort, SFComp sorts the array so that * the symbol with the lowest frequency is first. Any * NULL entries will be sorted to the end of the list. *//*两个huffman_node进行对比, 以count作为比较依据即对比两个不同 symbol 出现的频率非叶子节点通通排在后面*/static intSFComp(const void *p1, const void *p2){const huffman_node *hn1 = *(const huffman_node**)p1;const huffman_node *hn2 = *(const huffman_node**)p2;/* 将所有为NULL的元素排在最后 */if(hn1 == NULL && hn2 == NULL)return 0;if(hn1 == NULL)return 1;if(hn2 == NULL)return -1;if(hn1->count > hn2->count)return 1;//返回值为正,参数hn2排在前面else if(hn1->count < hn2->count)return -1;//返回值为负,参数hn1排在前面return 0;}#if 1static voidprint_freqs(SymbolFrequencies * pSF){//size_t:unsigned int 4字节size_t i;for(i = 0; i < MAX_SYMBOLS; ++i){if((*pSF)[i])//频率不为0,//symbol:字符,count:symbol出现的次数printf("%d, %ld\n", (*pSF)[i]->symbol, (*pSF)[i]->count);//symbol:1字节elseprintf("NULL\n");}}#endif/* * build_symbol_encoder builds a SymbolEncoder by walking * down to the leaves of the Huffman tree and then, * for each leaf, determines its code. *///递归遍历huffman码树,为每个symbol建立码字static voidbuild_symbol_encoder(huffman_node *subtree, SymbolEncoder *pSF){if(subtree == NULL)return;if(subtree->isLeaf)//如果是叶结点(*pSF)[subtree->symbol] = new_code(subtree);//则产生码字else{//递归遍历,遍历各个结点,如果isleaf,则产生码字build_symbol_encoder(subtree->zero, pSF);build_symbol_encoder(subtree->one, pSF);}}//static huffman_code* new_code(const huffman_node* leaf){/* Build the huffman code by walking up to * the root node and then reversing the bits, * since the Huffman code is calculated by * walking down the tree. */unsigned long numbits = 0;//记录走了多少步unsigned char* bits = NULL;//编码huffman_code *p;//码字结点while(leaf && leaf->parent)//该结点是叶子结点并且是父结点时{huffman_node *parent = leaf->parent;unsigned char cur_bit = (unsigned char)(numbits % 8);//bits[cur_byte]unsigned long cur_byte = numbits / 8; //第几个字节/* If we need another byte to hold the code,   then allocate it. */if(cur_bit == 0)//一个byte满了,需要重新分配{size_t newSize = cur_byte + 1;//重新分配内存,此处需要强制类型转换,因为realloc函数返回的是void类型的指针bits = (unsigned char*)realloc(bits, newSize);bits[newSize - 1] = 0; /* 初始化新byte为0. */}/* If a one must be added then or it in. If a zero * must be added then do nothing, since the byte * was initialized to zero. *///leaf是其parent的右结点,需要初始化为1,如果是左结点,初始化为0if(leaf == parent->one)bits[cur_byte] |= 1 << cur_bit;++numbits;//往上走步数加1leaf = parent;//此结点置为父结点}//如果编码含有1, 则需要进行反转if(bits)reverse_bits(bits, numbits);//反转bits中的二进制数据p = (huffman_code*)malloc(sizeof(huffman_code));//记录从叶子走到root需要多少步, 也就是说需要多少位来对指定的字符进行编码,并且赋值给码字结点中的numbitsp->numbits = numbits;//将码字赋值给码字结点中的bitsp->bits = bits;return p;//返回的是码字结点}/* * calculate_huffman_codes turns pSF into an array * with a single entry that is the root of the * huffman tree. The return value is a SymbolEncoder, * which is an array of huffman codes index by symbol value. *///建立huffman码树static SymbolEncoder*calculate_huffman_codes(SymbolFrequencies * pSF){unsigned int i = 0;unsigned int n = 0;huffman_node *m1 = NULL, *m2 = NULL;//m1:左结点 m2:右结点SymbolEncoder *pSE = NULL;//存放码字的指针#if 1printf("BEFORE SORT\n");print_freqs(pSF);   //演示堆栈的使用#endif/* Sort the symbol frequency array by ascending frequency.以symbol频率为依据做升序排列*/qsort((*pSF), MAX_SYMBOLS, sizeof((*pSF)[0]), SFComp);  #if 1printf("AFTER SORT\n");print_freqs(pSF);#endif/* Get the number of symbols. *///计算当前待编码文件中信源符号的总数for(n = 0; n < MAX_SYMBOLS && (*pSF)[n]; ++n)//&&:逻辑与;/*Construct a Huffman tree. * Note that this implementation uses a simple count instead of probability. *///使用计数而不是概率for(i = 0; i < n - 1; ++i)//循环n-1次{/* m1,m2设置为频率最低的信源符号 */m1 = (*pSF)[0];m2 = (*pSF)[1];/* 将m1,m2合并成一个huffman结点(非叶结点),存到数组中,左右结点分别是m1,m2,新的结点的频数是m1,m2频数之和 */(*pSF)[0] = m1->parent = m2->parent =//将此非叶结点设置成左右结点的父结点new_nonleaf_node(m1->count + m2->count, m1, m2);(*pSF)[1] = NULL;//用(*pSF)[0]指向该新建结点,而将(*pSF)[1](下一个结点)置空/* 重新排序*/qsort((*pSF), n, sizeof((*pSF)[0]), SFComp);}//给码字结点指针数组分配内存空间 pSE = (SymbolEncoder*)malloc(sizeof(SymbolEncoder));memset(pSE, 0, sizeof(SymbolEncoder));//初始化该数组build_symbol_encoder((*pSF)[0], pSE);//以此结点为root建立huffman树return pSE;//返回码字指针}/*编码后的格式 :  0-3个byte是FILE内出现的不同字符个数(几不同的字符个数)  4-7个byte是FILE内出现的全部字符个数(所有的字符)  8-X是真正的编码后值*/static intwrite_code_table(FILE* out, SymbolEncoder *se, unsigned int symbol_count){unsigned long i, count = 0;/* Determine the number of entries in se. *///计算 SymbolEncoder 内具有编码码字的信源符号的个数for(i = 0; i < MAX_SYMBOLS; ++i){if((*se)[i])++count;}/* Write the number of entries in network byte order. *///将码字种数写入到文件头部, 即[0, 3]一共4个字节i = htonl(count);   //在网络传输中,采用big-endian序,对于0x0A0B0C0D ,传输顺序就是0A 0B 0C 0D ,//因此big-endian作为network byte order,little-endian作为host byte order。//little-endian的优势在于unsigned char/short/int/long类型转换时,存储位置无需改变//将字符种类数写入到输出文件if(fwrite(&i, sizeof(i), 1, out) != 1)return 1;/* 将输入文件中的信源符号种类数写入到输出文件. */symbol_count = htonl(symbol_count);if(fwrite(&symbol_count, sizeof(symbol_count), 1, out) != 1)return 1;/* 写码字. */for(i = 0; i < MAX_SYMBOLS; ++i){huffman_code *p = (*se)[i];/* symbol  -- 信源符号       numbits  -- 叶子走到root需要的步数       bits    -- 编码码字, 比如说1101001)*/if(p){unsigned int numbytes;/* 写入符号(1字节)*/fputc((unsigned char)i, out);/* 写入码字长度. */fputc(p->numbits, out);/* 写入码字(从leaf到root的方式)         需知道码字的位数, 如果编码为9位, 就需要2个byte来存储码字          如果为4位, 1个byte存储就可以, */numbytes = numbytes_from_numbits(p->numbits);//先进行byte转换,不够就补0if(fwrite(p->bits, 1, numbytes, out) != numbytes)return 1;}}return 0;}/* * Allocates memory and sets *pbufout to point to it. The memory * contains the code table. *///以指定的格式将编码后的数据写入到内存中,由pbufout指针表示//与write_code_table函数原理一样,只是将数据写入内存static intwrite_code_table_to_memory(buf_cache *pc,   SymbolEncoder *se,   unsigned int symbol_count){unsigned long i, count = 0;/* Determine the number of entries in se. */for(i = 0; i < MAX_SYMBOLS; ++i){if((*se)[i])++count;//计算具有编码码字的信源符号的个数}/* Write the number of entries in network byte order. *////将码字种数将写入到内存中, 即[0, 3]一共4个字节i = htonl(count);if(write_cache(pc, &i, sizeof(i)))return 1;/* 将字符种数写入到内存中. */symbol_count = htonl(symbol_count);if(write_cache(pc, &symbol_count, sizeof(symbol_count)))return 1;/* 将码字写入到内存中. */for(i = 0; i < MAX_SYMBOLS; ++i){huffman_code *p = (*se)[i];if(p){unsigned int numbytes;/* The value of i is < MAX_SYMBOLS (256), so it canbe stored in an unsigned char. */unsigned char uc = (unsigned char)i;/* Write the 1 byte symbol. */if(write_cache(pc, &uc, sizeof(uc)))return 1;/* Write the 1 byte code bit length. *///便于解码时使用uc = (unsigned char)p->numbits;if(write_cache(pc, &uc, sizeof(uc)))return 1;/* Write the code bytes. */numbytes = numbytes_from_numbits(p->numbits);if(write_cache(pc, p->bits, numbytes))return 1;}}return 0;}/* * read_code_table builds a Huffman tree from the code * in the in file. This function returns NULL on error. * The returned value should be freed with free_huffman_tree. *///读入文件开头的码表部分,并建立huffman码树static huffman_node*read_code_table(FILE* in, unsigned int *pDataBytes){huffman_node *root = new_nonleaf_node(0, NULL, NULL);unsigned int count;/* Read the number of entries. (which is stored in network byte order). *///读取字符种类数,count是以网络字节序存储的if(fread(&count, sizeof(count), 1, in) != 1){free_huffman_tree(root);return NULL;}//network order to host order,与htonl相反、l:32位count = ntohl(count);/* Read the number of data bytes this encoding represents. *///读取字符总数if(fread(pDataBytes, sizeof(*pDataBytes), 1, in) != 1){free_huffman_tree(root);return NULL;}//network order to host order*pDataBytes = ntohl(*pDataBytes);/* Read the entries. *///读码字while(count-- > 0){int c;unsigned int curbit;unsigned char symbol;unsigned char numbits;unsigned char numbytes;unsigned char *bytes;huffman_node *p = root;if((c = fgetc(in)) == EOF)//遍历文件,EOF代表文件结束{free_huffman_tree(root);return NULL;}symbol = (unsigned char)c;//读取字符,要进行类型转换//读取码字位数if((c = fgetc(in)) == EOF){free_huffman_tree(root);return NULL;}numbits = (unsigned char)c;numbytes = (unsigned char)numbytes_from_numbits(numbits);//bits-->bytes//读取对应的码字(长度为numbytes)bytes = (unsigned char*)malloc(numbytes);if(fread(bytes, 1, numbytes, in) != numbytes){free(bytes);free_huffman_tree(root);return NULL;}/* * Add the entry to the Huffman tree. The value * of the current bit is used switch between * zero and one child nodes in the tree. New nodes * are added as needed in the tree. *///依据读取的码字重建码树,zero:左结点,one:右结点for(curbit = 0; curbit < numbits; ++curbit){if(get_bit(bytes, curbit))//当前读取位{if(p->one == NULL)//如果右结点为空,可以建立叶子结点,如果不是NULL,说明前面有码字{p->one = curbit == (unsigned char)(numbits - 1)//curbit == (unsigned char)(numbits - 1),如果为TRUE,说明到达叶结点,建立叶结点? new_leaf_node(symbol): new_nonleaf_node(0, NULL, NULL);p->one->parent = p;//‘1’的一枝父结点 指向当前的结点}p = p->one;//沿1方向向下移动一级}else{//当前读取位为0if(p->zero == NULL){p->zero = curbit == (unsigned char)(numbits - 1)? new_leaf_node(symbol): new_nonleaf_node(0, NULL, NULL);p->zero->parent = p;}p = p->zero;}}free(bytes);//huffman码树已建成,释放码字}return root;//返回huffman数的根节点}static int memread(const unsigned char* buf,unsigned int buflen,unsigned int *pindex,//读取位置void* bufout,//unsigned int readlen){//如果buf,pindex,bufout为NULL,就停止运行程序assert(buf && pindex && bufout);assert(buflen >= *pindex);//如果内存长度大于读取位置,停止运行if(buflen < *pindex)return 1;//可以正确读取//如果读取长度和已读取位置之和大于内存长度,返回1if(readlen + *pindex >= buflen)return 1;//将buf + *pindex之后的内容按读取到数据的大小,写入输出存储memcpy(bufout, buf + *pindex, readlen);*pindex += readlen;//读取的位置向后移动相应读取量return 0;}//读输入内存开头的码表部分,与read_code_table函数原理基本一致,但要依靠memread函数static huffman_node*read_code_table_from_memory(const unsigned char* bufin,//输入内存unsigned int bufinlen,//输入内存的大小unsigned int *pindex,//当前读取位置unsigned int *pDataBytes)//读取的字符总数{huffman_node *root = new_nonleaf_node(0, NULL, NULL);//调用函数建立一个root结点unsigned int count;/* 读取码字. */if(memread(bufin, bufinlen, pindex, &count, sizeof(count))){free_huffman_tree(root);return NULL;}//网络字节序转换成主机字节序count = ntohl(count);/* 读取字符总数 */if(memread(bufin, bufinlen, pindex, pDataBytes, sizeof(*pDataBytes))){free_huffman_tree(root);return NULL;}//网络字节序转换成主机字节序*pDataBytes = ntohl(*pDataBytes);/* Read the entries. *///检查是否仍有叶节点未建立,每循环一次建立起一条由根节点至叶结(符号)的路径 while(count-- > 0){unsigned int curbit;unsigned char symbol;unsigned char numbits;unsigned char numbytes;unsigned char *bytes;huffman_node *p = root;//字符if(memread(bufin, bufinlen, pindex, &symbol, sizeof(symbol))){free_huffman_tree(root);return NULL;}//码字长度if(memread(bufin, bufinlen, pindex, &numbits, sizeof(numbits))){free_huffman_tree(root);return NULL;}//numbytes = (unsigned char)numbytes_from_numbits(numbits);//读取码字bytes = (unsigned char*)malloc(numbytes);if(memread(bufin, bufinlen, pindex, bytes, numbytes)){free(bytes);free_huffman_tree(root);return NULL;}/* * Add the entry to the Huffman tree. The value * of the current bit is used switch between * zero and one child nodes in the tree. New nodes * are added as needed in the tree. *///读取当前码字的每一位,并依据读取的结果逐建立起由根节点至该符号叶结点的路径 for(curbit = 0; curbit < numbits; ++curbit){if(get_bit(bytes, curbit))//当前读取位是否为1{if(p->one == NULL)//当前读取位为1{p->one = curbit == (unsigned char)(numbits - 1)? new_leaf_node(symbol): new_nonleaf_node(0, NULL, NULL);p->one->parent = p;}p = p->one;}else{if(p->zero == NULL)//当前读取位为0{p->zero = curbit == (unsigned char)(numbits - 1)? new_leaf_node(symbol): new_nonleaf_node(0, NULL, NULL);p->zero->parent = p;}p = p->zero;}}free(bytes);}return root;}//第二次扫描文件static intdo_file_encode(FILE* in, FILE* out, SymbolEncoder *se){unsigned char curbyte = 0;unsigned char curbit = 0;int c;while((c = fgetc(in)) != EOF)//遍历文件的每一个字符{unsigned char uc = (unsigned char)c;huffman_code *code = (*se)[uc];//查表unsigned long i;for(i = 0; i < code->numbits; ++i)//每完成一个循环就写入一个码字{/* Add the current bit to curbyte. *///将curbyte的相应位变成二进制数curbyte |= get_bit(code->bits, i) << curbit;/* If this byte is filled up then write it * out and reset the curbit and curbyte. *///判断当前字节是否写满,if(++curbit == 8){//依次将各个字符的编码写入到out中, 不对编码进行整齐,//不将编码强制为byte类型了, 而是直接写入到out中.fputc(curbyte, out);//如果满了,就输出该byte,curbyte = 0;//重置当前位curbit = 0;}}}/* * If there is data in curbyte that has not been * output yet, which means that the last encoded * character did not fall on a byte boundary, * then output it. *///curbit>0,说明当前byte没有写满,将最后一个byte输出到文件if(curbit > 0)fputc(curbyte, out);return 0;}//与do_file_encode原理基本一致,不同的是扫描文件后写入到内存中static int do_memory_encode(buf_cache *pc, const unsigned char* bufin, unsigned int bufinlen, SymbolEncoder *se){unsigned char curbyte = 0;unsigned char curbit = 0;unsigned int i;for(i = 0; i < bufinlen; ++i){unsigned char uc = bufin[i];huffman_code *code = (*se)[uc];//查表unsigned long i;for(i = 0; i < code->numbits; ++i){/* Add the current bit to curbyte. *///将curbit写入到curbyte中curbyte |= get_bit(code->bits, i) << curbit;/* If this byte is filled up then write it * out and reset the curbit and curbyte. */if(++curbit == 8){//当前byte写满后,写入到内存中if(write_cache(pc, &curbyte, sizeof(curbyte)))return 1;curbyte = 0;//重置当前位curbit = 0;}}}/* * If there is data in curbyte that has not been * output yet, which means that the last encoded * character did not fall on a byte boundary, * then output it. *///curbit>0,说明当前byte没有写满,仍将最后一个byte输出到文件return curbit > 0 ? write_cache(pc, &curbyte, sizeof(curbyte)) : 0;}//int huffST_getSymFrequencies(SymbolFrequencies *SF, huffman_stat *st,int total_count){int i,count =0;for(i = 0; i < MAX_SYMBOLS; ++i){if((*SF)[i])//字符出现的频率不为0{//计算该字符出现的频率存储在结构体huffman_stat中st->freq[i]=(float)(*SF)[i]->count/total_count;count+=(*SF)[i]->count;//所有出现的字符的次数之和,目的是为了遍历}else {st->freq[i]= 0;//该字符出现的频率为0}}if(count==total_count)//遍历整个文件中的所有字符return 1;elsereturn 0;}//int huffST_getcodeword(SymbolEncoder *se, huffman_stat *st){unsigned long i,j;for(i = 0; i < MAX_SYMBOLS; ++i){huffman_code *p = (*se)[i];//第i个字符对应的码字if(p){unsigned int numbytes;            st->numbits[i] = p->numbits;numbytes = numbytes_from_numbits(p->numbits);//bits向bytes的转换for (j=0;j<numbytes;j++)    st->bits[i][j] = p->bits[j];}elsest->numbits[i] =0;}return 0;}//输出数据:字符、频率、码长、码字void output_huffman_statistics(huffman_stat *st,FILE *out_Table){int i,j;unsigned char c;fprintf(out_Table,"symbol\t   freq\t   codelength\t   code\n");//for(i = 0; i < MAX_SYMBOLS; ++i){fprintf(out_Table,"%d\t   ",i);//字符fprintf(out_Table,"%f\t   ",st->freq[i]);//频率fprintf(out_Table,"%d\t    ",st->numbits[i]);//码长if(st->numbits[i])//码长不为0{//以二进制的形式输出码字for(j = 0; j < st->numbits[i]; ++j){c =get_bit(st->bits[i], j);fprintf(out_Table,"%d",c);}}fprintf(out_Table,"\n");}}///* * huffman_encode_file huffman encodes in to out. */int huffman_encode_file(FILE *in, FILE *out, FILE *out_Table) {SymbolFrequencies sf;//用于存储字符频率SymbolEncoder *se;//存放编码的码字huffman_node *root = NULL;//根节点int rc;unsigned int symbol_count;//字符频数    //huffman_stat hs;//输出数据结构体///* 计算输入文件中字符出现的总频数 */symbol_count = get_symbol_frequencies(&sf, in); //// 依据统计到的频数计算字符对应的频率    huffST_getSymFrequencies(&sf,&hs,symbol_count);    ///* Build an optimal table from the symbolCount. *///创建huffman码树se = calculate_huffman_codes(&sf);root = sf[0];//根节点    // 获得码字并输出到列表huffST_getcodeword(se, &hs);output_huffman_statistics(&hs,out_Table);///* Scan the file again and, using the table   previously built, encode it into the output file. *///将文件内部的指针重新指向输入文件流的开头rewind(in);//将字符频数、码表等写入到输出文件首部rc = write_code_table(out, se, symbol_count);//写入成功返回0//写入成功后进行第二次扫描,对源文件进行编码输出if(rc == 0)rc = do_file_encode(in, out, se);/* Free the Huffman tree and the code. */free_huffman_tree(root);free_encoder(se);return rc;}inthuffman_decode_file(FILE *in, FILE *out){huffman_node *root, *p;//根节点,码字结点int c;unsigned int data_count;//字符总数/* 读输入文件首部的码表部分,获取字符总数,码字,码长等信息,依据获得的信息重建码树 */root = read_code_table(in, &data_count);if(!root)return 1;/* 从根节点开始遍历码树到叶结点,获得结点对应的字符*/p = root;while(data_count > 0 && (c = fgetc(in)) != EOF)//遍历文件{unsigned char byte = (unsigned char)c;unsigned char mask = 1;//1字节,8比特//mask左移变为00000000后,一个byte解析完成,进行下一个,但是byte可能不全是码字,有部分补充的0,?while(data_count > 0 && mask){p = byte & mask ? p->one : p->zero;//mask用来逐位读出二进制码元mask <<= 1;/*左移:mask从00000001,循环左移,直到为00000000,此时循环的条件不在满足,而一个byte中的码元都被遍历用于解码*/if(p->isLeaf)//下移后判断是否为叶子结点,是,表明码字解析完成,找到了对应的字符{fputc(p->symbol, out);//输出解码得到的字符//重新回到根节点进行新一轮的遍历(这就造成了huffman码解码时间不均????)p = root;--data_count;//字符总数自减,待解码的字符数减一}}}free_huffman_tree(root);//解码完成后释放码树return 0;}#define CACHE_SIZE 1024int huffman_encode_memory(const unsigned char *bufin,  unsigned int bufinlen,  unsigned char **pbufout,  unsigned int *pbufoutlen){SymbolFrequencies sf;SymbolEncoder *se;huffman_node *root = NULL;int rc;unsigned int symbol_count;buf_cache cache;/* 确保输出内存的参数可用. */if(!pbufout || !pbufoutlen)return 1;//init_cache给cache空间设置初始值,if(init_cache(&cache, CACHE_SIZE, pbufout, pbufoutlen))return 1;/* Get the frequency of each symbol in the input memory. *///从内存中获得字符数目,也就是输入文件的大小symbol_count = get_symbol_frequencies_from_memory(&sf, bufin, bufinlen);/* Build an optimal table from the symbolCount. *///根据获得的频率,建立码树,得到码字结点se = calculate_huffman_codes(&sf);root = sf[0];/* Scan the memory again and, using the table   previously built, encode it into the output memory. *///再次扫描内存,将字符种类数,码字写入到cacherc = write_code_table_to_memory(&cache, se, symbol_count);//rc=0.表示写入内存成功,对内存中写入的数据进行编码if(rc == 0)rc = do_memory_encode(&cache, bufin, bufinlen, se);/* Flush the cache. */flush_cache(&cache);//将cache中的数据放入pbufout,并清洗cache/* Free the Huffman tree. */free_huffman_tree(root);//释放huffman码树所占用的内存free_encoder(se);//释放码字所占用的内存free_cache(&cache);//释放cachereturn rc;}//对输入内存进行解码,将解码后的数据存入到pbufout中int huffman_decode_memory(const unsigned char *bufin,  unsigned int bufinlen,  unsigned char **pbufout,  unsigned int *pbufoutlen){huffman_node *root, *p;//根节点,码字结点unsigned int data_count;unsigned int i = 0;unsigned char *buf;//临时内存unsigned int bufcur = 0;///* 确保输出内存参数的可用. */if(!pbufout || !pbufoutlen)return 1;/* Read the Huffman code table. *///从输入文件开始处读码表。记录输入文件中的字符总数,存入bufin指向的内存root = read_code_table_from_memory(bufin, bufinlen, &i, &data_count);if(!root)return 1;buf = (unsigned char*)malloc(data_count);//依据已编码的字符总数data_count开辟临时内存/* 从根节点开始遍历码树到叶结点,获得结点对应的字符 */p = root;for(; i < bufinlen && data_count > 0; ++i) {unsigned char byte = bufin[i];//读取到的码字unsigned char mask = 1;while(data_count > 0 && mask)//与huffman_decode_file部分原理一致{p = byte & mask ? p->one : p->zero;mask <<= 1;if(p->isLeaf){//将解析出的码字存入临时内存,bufcur代表解析出的字符的数目buf[bufcur++] = p->symbol;p = root;//重新回到根节点--data_count;//待解码的字符数减}}}free_huffman_tree(root);//解码完成,释放码树*pbufout = buf;//将临时内存中的字符送入到输出内存中*pbufoutlen = bufcur;//bufcur代表解析出的字符的数目,也就是输出内存的长度return 0;}四.实验结果
量化表:

交流系数码表:

直流系数码表:






















原创粉丝点击