03 算术编码
来源:互联网 发布:淘宝打包人员招聘 编辑:程序博客网 时间:2024/05/16 08:15
算术编码是一种设计的非常巧妙的方法,关于算术编码的详细介绍可以参考 中科院的课件 ,国防科大视频教学1 2
其根本原理是:将输入的整个序列字符串编码成为一个定点小数(一旦理解了这句话,所有的过程都变得理所当然了)
这里重点介绍如使用定点小数实现算术编码,代码来自 http://michael.dipperstein.com/arithmetic/index.html
我们先不看代码,举一个了例子
序列:1 3 2 1
Count {1, 2, 3} = {40, 1, 9}
Total count TC = 50
Cumulative count CC {0, 1, 2, 3} = {0, 40, 41, 50}
首先我们用多少二进制位表示lower和upper呢
因为我们会保证upper - lower > 0.25
因此为了保证不溢出必须有 最小概率区间1/50 > 2^(2-n)
解得:n = 8
编码算法的伪代码描述如下:
l=00…0, u=11…1, e3_count=0repeat x=get_symbol l = l + (u-l+1)*CC(x-1)/TC // lower bound update u = l + (u-l+1)*CC(x)/TC - 1 // upper bound update while(MSB(u)==MSB(l) OR E3(u,l)) // MSB(u)=MSB(l)=0 ? E1 rescaling if(MSB(u)==MSB(l)) // MSB(u)=MSB(l)=1 ? E2 rescaling send(MSB(u)) l = (l<<1)+0 // shift left, set LSB to 0 u = (u<<1)+1 // shift left, set LSB to 1 while(e3_count>0) send(!MSB(u)) // encode accumulated E3 rescalings e3_count-- endwhile endif if(E3(u,l)) // perform E3 rescaling & remember l = (l<<1)+0 u = (u<<1)+1 complement MSB(u) and MSB(l) e3_count++ endif endwhileuntil done
编码步骤
l(0) = 0 ( 00000000)
u(0) = 255 (11111111)
Input:1321
l(1) = 0 + 256*0/50 = 0 (00000000)
u(1) = 0 + 256*40/50 -1 = 203 (11001011)
MSB(l)!=MSB(u),E3 = false
Output:
Input:-321
l(2) = 0 + 204*41/50 = 167(10100111)
u(2) = 0 + 204*50/50 -1 = 203 (11001011)
MSB(l)==MSB(u)
Output:1
l(2) = (10100111)<<1 + 0 = (01001110) = 78
u(2) = (11001011)<<1 + 1 = (10010111) = 151
E3 = true
l(2) = ((01001110)<<1 + 0)xor(10000000) = 28
u(2) = ((10010111)<<1 + 1)xor(10000000) = 175
e3_count = 1Input = --21
l(3) = 28+148*40/50 = 146(10010010)
u(3) = 28+148*41/50 - 1 = 148 (10010100)
MSB(l)==MSB(u)=1
e3_count = 1
Output:110
Input = ---1
l(3) = (10010010)<<1 = (00100100) = 36
u(3) = (10010100)<<1 + 1 = (00101001) = 41
MSB(l)==MSB(u)=0
Output:1100
Input = ---1
l(3) = (00100100)<<1 = (010010000) = 72
u(3) = (00101001)<<1 + 1 = (01010011) = 83
MSB(l)==MSB(u)=0
Output:11000
Input = ---1
l(3) = (01001000)<<1 = (10010000) = 144
u(3) = (01010011)<<1 + 1 = (10100111) = 167
MSB(l) == MSB(u)=1
Output:110001
Input = ---1
l(3) = (10010000)<<1 = (00100000) = 32
u(3) = (10100111)<<1 + 1 = (01001111) = 79
MSB(l)==MSB(u)=0
Output:1100010
Input = ---1
l(3) = (00100000)<<1 = (01000000) = 64
u(3) = (01001111)<<1 + 1 = (10011111) = 159
MSB(l)!=MSB(u) , E(3) = true
l(3) = ((01000000)<<1 + 0)xor(10000000) = 0
u(3) = ((10011111)<<1 + 1)xor(100000000) = 191
e3_count = 1Input = ---1
l(4) = 0 + 191*0/50 = 0 = (00000000)
u(4) = 0 + 192*40/50 - 1 = 152 = (10011000)
MSB(l)!=MSB(u) , E(3) = false
Output= 1100010
算法表达
算术编码符合概率匹配原则:出现概率较大的符号时upper -lower 缩小的较慢,出现相同位的数目(输出位数)就会较少,消耗的编码也较少。
(1) Build Probability Range List 建立累积密度表ranges[257],并作为文件头写入WriteHeader。这里有两个技巧:
1. 如果要编码的文件很大,统计得到的totalCount就会超过事项设定的MAX_PROBABILITY,可以进行缩放处理。
2. 为了压缩存储这个累积密度表,可以采用差分编码SymbolCountToProbabilityRanges,解码的时候在还原为ranges[ ],因此这个过程本身和算术编码是无关的。
(2) 初始化数据
/* initialize coder start with full probability range [0%, 100%) */ lower = 0; upper = ~0; /* all ones */ underflowBits = 0;
(3) 进入编码过程
/* encode symbols one at a time */ while ((c = fgetc(fpIn)) != EOF) { ApplySymbolRange(c, staticModel); WriteEncodedBits(bfpOut); }
ApplySymbolRange这个函数正如其名,应用符号范围,即使用输入字符c改变lower和upper
range = (unsigned long)(upper - lower) + 1; /* current range */ upper = lower + (probability_t)(ranges[UPPER(symbol)]*range/cumulativeProb) - 1; lower = lower + (probability_t)(ranges[LOWER(symbol)]*range/cumulativeProb);但是这会遇到一个精度的问题,当range很小时,由于计算精度和-1的作用可能会出现 lower > upper
if (lower > upper) { /* compile this in when testing new models. */ fprintf(stderr, "Panic: lower (%X)> upper (%X)\n", lower, upper); }
这当然是不正常的,我们应该使range 尽量的大,本算法则保证>0.25
ApplySymbolRange之后,就可以将c编码并写入文件了。
这个函数是整数编码器最核心的地方,
/*这个函数,主要是将对lower,upper区间进行的变化(三种)记录并进行输出*//*E1 = 0E2 = 1E3 … E3 E1 = E1 E2 … E2 E3 … E3 E2 = E2 E1 … E1 规则:记录E3 连续的次数,并在输出下一个E2/E1 之后发送该次数请参考:http://www.stanford.edu/class/ee398a/handouts/papers/WittenACM87ArithmCoding.pdf*/void WriteEncodedBits(bit_file_t *bfpOut){ for (;;) {/** 输出0 : [l, u] < [0, 0.5) => [0, 1); E1(x) = 2x* 输出1 : [l, u] < [0. 5,1) => [0, 1);E2(x) = 2(x-0.5)* 这两种情况在二进制处理时,可以统一起来,就是高位溢出* 扩大区间:range*=2*/if ((upper & MASK_BIT(0)) == (lower & MASK_BIT(0))) //upper和lower最高位相等 0.00 < upper - lower < 0.50 { /* MSBs match, write them to output file */ BitFilePutBit((upper & MASK_BIT(0)) != 0, bfpOut); //输出最高位 /* we can write out underflow bits too */ while (underflowBits > 0) { BitFilePutBit((upper & MASK_BIT(0)) == 0, bfpOut); //发送变换次数 ,用最高位的反 underflowBits--; } }/**lower upper * 01 10 *(0.25,0.75) => [0, 1) ; E3(x) = 2*(x - 0.25)*扩大区间:range*=2MSB(x) = Most Significant Bit of xLSB(x) = Least Significant Bit of xSB(x, i) = ith Significant Bit of xMSB(x) = SB(x, 1); LSB(x) = SB(x, m)E3(l, u) = (SB(l, 2) == 1 && SB(u, 2) == 0)*/ else if ((lower & MASK_BIT(1)) && !(upper & MASK_BIT(1))) { /**************************************************************** 当两者的差值过小的话,ApplySymbolRange中可能会发生溢出 * 这样就记作一次变换,将变换次数加1* 校正之后会进入else 然后return***************************************************************/ underflowBits ++; lower &= ~(MASK_BIT(0) | MASK_BIT(1)); //前两位清零 //lower &= ~(MASK_BIT(1)); upper |= MASK_BIT(1); //前两位置1 /*************************************************************** * The shifts below make the rest of the bit removal work. If * you don't believe me try it yourself. ***************************************************************/ }/*编码完成 进入下一个字符lower upper00 10 (0.25,0.75) 00 11 (0.25,1.00)01 11 (0.50,1.00)range > 0.25 ApplySymbolRange肯定不会溢出*/ else {return ; } /******************************************************************* * Shift out old MSB and shift in new LSB. Remember that lower has * all 0s beyond it's end and upper has all 1s beyond it's end. *******************************************************************/ lower <<= 1; upper <<= 1; upper |= 1; }}
我们发现,当输入一个字符,在循环中会经历三个过程,首先高位相同,这时可以直接输出,但是随后会出现upper和lower越来越近,出现上述的精度问题,必须进行某种变换,将他们拉开,请参考注释。我们只需要记住我们做了几次这种变换,在解码的时候变回去;
- 03 算术编码
- 算术编码
- 算术编码
- 算术编码
- 算术编码
- 算术编码
- 算术编码
- 算术编码
- 算术编码
- 算术编码
- 算术编码算法
- 算术编码简单研究
- 算术编码简单研究
- 算术编码简单研究
- 算术编码(浮点)
- 模拟算术编码
- 算术编码简介
- matlab 算术编码
- SQL中使用WITH AS提高性能-使用公用表表达式(CTE)简化嵌套SQL
- Activity之间传值(arraylist类型)A传值到B。B再返回值给A
- 【Boost.Asio学习笔记】C/S通信简例
- 算法系列(四)最大公因数与模的除法
- C# 异或运算符
- 03 算术编码
- JNDI 二
- hdoj1723
- go on json
- 赋值操作符重载函数形参规格
- 设置图片的高度,宽度,间距,链接,对齐
- J2SE下使用JNDI
- hdu4198
- UITextView 退出键盘的方式