PSP《大众高尔夫2P》XB资源包算法分析(4)

来源:互联网 发布:java heap的使用 编辑:程序博客网 时间:2024/04/30 08:09

未知算法0x00(huffman+lzss)

定位函数

为了分析该算法,我们来到第二个xb文件:

TH:0x0477D673(RA:0x8002013A) sceIoOpen("umd1:", 0x00000001, 00, ) = 3
TH:0x0477D673(RA:0x8002013A) sceIoLseek(3, 0x00000000_0005F600, 0x00000000, ) = 0x0005F600
TH:0x0477D673(RA:0x8002013A) sceIoRead(3, 0x08BE47C0, 0x00000086, ) = 0x00000086

对应的文件为:
0390656 , /PSP_GAME/USRDIR/xbdata/yumo/common.xb

用WinHex打开该xb文件,看到包内部第一个文件采用了0x00的算法

Offset  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
00000000                        D0 04 01 00 2A 00 00 00 Ð...*...

同样的在 0x2A*4和0x2A*4 + 8偏移地址设置读断点:

bpset 0x08BE4868 r
EPC - 0x881FACAC

(注:因为采用了同样的地址,第一次会停在loading.xb上,跳过这次。直到在函数参数所对应的地址上发现了我们在WinHex上看到的预期数据)

bpset 0x08BE4870 r
EPC - 0x0882AC0C

这时打印一下函数的调用关系:

bpset 0x08BE4868 r
EPC – 0x881FACAC
host0:/> bt
882ac0c(8be4870,9ffde60,3287,c960) pc [882ac0c] ra [882ad08] sz [0]
882ace4(8be4870,9ffde60,3287,c960) pc [882ad08] ra [882af18] sz [2064]
882aec0(8be4870,9ffde60,3287,c960) pc [882af18] ra [882a934] sz [32]
882a860(8be4870,9ffde60,3287,c960) pc [882a934] ra [882f504] sz [16]
882f4d0(8be4870,9ffde60,3287,c960) pc [882f504] ra [882f160] sz [16]
882f078(8be4870,9ffde60,3287,c960) pc [882f160] ra [8831ff0] sz [32]
8831e24(8be4870,9ffde60,3287,0) pc [8831ff0] ra [8831d04] sz [576]
8831cd0(8be4870,9ffde60,3287,0) pc [8831d04] ra [8832eb8] sz [16]
8832e7c(8be4870,9ffde60,3287,0) pc [8832eb8] ra [8832d14] sz [16]
8832cd4(8be4870,9ffde60,3287,0) pc [8832d14] ra [8945a88] sz [16]
8945914(8be4870,9ffde60,3287,0) pc [8945a88] ra [8804714] sz [368]
8804358(8be4870,9ffde60,3287,0) pc [8804714] ra [9ffeac0] sz [32]
9e3f088(8be4870,9ffde60,3287,0) pc [9ffeac0] ra [0] sz [0]
done!

将怀疑点放在sub_0882ac0c

确认函数功能

host0:/> bpset 0x882AC0C
0x0882AC0C: 0x0000000D '....' - break 0x0
host0:/>
exprint
Exception – Breakpoint
Thread ID – 0x0478A275
Th Name – main
Module ID – 0x0479A00B
Mod Name – main
EPC – 0x0882AC0C
Cause – 0x10000024
BadVAddr – 0xA6E19114
Status – 0x60088613/
zr:0x00000000 at:0x00000001 v0:0x08F652A0 v1:0x08F652A0
a0:0x08BE4870 a1:0x09FFDE60 a2:0x00003287 a3:0x0000C960
t0:0x08AEFDE4 t1:0x08F68528 t2:0x08AEFDE4 t3:0x08AF0000
t4:0x08AEFA0C t5:0xDEADBEEF t6:0xDEADBEEF t7:0xDEADBEEF
s0:0x08F652A0 s1:0x00003287 s2:0x0882ACE4 s3:0x08BE4868
s4:0x08F652A0 s5:0x09FFE6FC s6:0x09FFE800 s7:0x08B40A94
t8:0xDEADBEEF t9:0xDEADBEEF k0:0x09FFEB00 k1:0x00000000
gp:0x08B3FB20 sp:0x09FFDE50 fp:0x08B3EA04 ra:0x0882AD08
host0:/>
memdump 0x08BE4870
- 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f – 0123456789abcdef
-----------------------------------------------------------------------------
08be4870 - 0B 00 00 01 00 01 01 03 04 08 FF 04 03 07 0C 10 – ................
08be4880 - 16 02 05 09 0B 0D 0F 11 13 14 15 18 20 22 26 2A - ............ "&*
08be4890 - 2E 32 36 41 80 81 F0 43 06 0A 0E 12 16 17 19 1A - .26A...C........
08be48a0 - 1B 1C 1D 1F 21 24 25 27 2D 30 31 35 3A 40 43 45 - ....!$%'-015:@CE
08be48b0 - 47 4B 50 51 53 55 56 5F 61 63 67 68 71 73 74 79 - GKPQSUV_acghqsty
08be48c0 - 7A 7E 83 85 89 8B 93 95 A1 A3 B1 B3 C0 C1 C3 DE - z~..............
08be48d0 - DF E1 E3 E9 F1 F3 F5 F6 F7 FB FD 50 1E 23 28 29 - ...........P.#()
08be48e0 - 2B 2F 33 34 37 38 39 3B 3C 3E 42 46 49 4A 4F 52 - +/34789;<>BFIJOR
08be48f0 - 57 5B 5D 60 64 65 66 69 6A 6D 70 75 76 77 78 7B - W[]`defijmpuvwx{
08be4900 - 7C 7D 87 8E 90 91 97 9B 9F A0 AA AF B6 B9 BB BF – |}..............
08be4910 - C5 C7 CB D0 D1 D2 D3 D5 D6 DA DB DC DD E0 E5 E6 – ................
08be4920 - E7 E8 EB ED EE EF F2 F8 F9 FA FC FE 3B 2C 3D 3F – ............;,=?
08be4930 - 44 48 4C 4D 4E 54 58 59 5A 5C 5E 62 6B 6C 6E 6F - DHLMNTXYZ/^bklno
08be4940 - 72 7F 82 84 86 88 8A 8D 92 94 98 99 9A 9D A2 A4 - r...............
08be4950 - A5 A6 A7 A9 AB AD AE B0 B2 B5 B7 BA BE C4 C6 C9 – ................
08be4960 - CC CE D7 D9 E4 EA EC F4 13 8C 8F 96 9C 9E A8 AC – ................
host0:/> memdump 0x09FFDE60
- 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f – 0123456789abcdef
-----------------------------------------------------------------------------
09ffde60 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffde70 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffde80 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffde90 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffdea0 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffdeb0 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffdec0 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffded0 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffdee0 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffdef0 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffdf00 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffdf10 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffdf20 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffdf30 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffdf40 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
09ffdf50 - FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF – ................
host0:/> bpset 0x882acdc
host0:/> bt
Breakpoint List:
0 : Addr:0x0882AC0C Inst:0x908A0000 Flags:----
1 : Addr:0x0882ACDC Inst:0x03E00008 Flags:----
host0:/> bc 0
host0:/> c
host0:/> 0x0882ACDC: 0x0000000D '....' - break 0x0
host0:/> exprint
Exception – Breakpoint
Thread ID – 0x0478A275
Th Name – main
Module ID – 0x0479A00B
Mod Name – main
EPC – 0x0882ACDC
Cause – 0x10000024
BadVAddr – 0xA6E19114
Status – 0x60088613
zr:0x00000000 at:0x00000001 v0:0x08BE497C v1:0x00001000
a0:0x08BE497C a1:0x09FFDE60 a2:0x000000E2 a3:0x0000000A
t0:0x00000800 t1:0x00000001 t2:0x0000000B t3:0xFFFFFFFF
t4:0x00000000 t5:0x00000BBF t6:0xFFFFFFFF t7:0x0000000C
s0:0x08F652A0 s1:0x00003287 s2:0x0882ACE4 s3:0x08BE4868
s4:0x08F652A0 s5:0x09FFE6FC s6:0x09FFE800 s7:0x08B40A94
t8:0x00000FDC t9:0xDEADBEEF k0:0x09FFEB00 k1:0x00000000
gp:0x08B3FB20 sp:0x09FFDE50 fp:0x08B3EA04 ra:0x0882AD08

host0:/> memdump 0x08BE497C
- 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f – 0123456789abcdef
-----------------------------------------------------------------------------
08be497c - 47 19 61 8B 07 B8 E7 A1 79 39 EE D4 54 77 5A 5E - G.a.....y9..TwZ^
08be498c - 41 1E CC B0 69 A0 A2 83 51 09 80 6C 40 2A 01 3E - A...i...Q..l@*.>
08be499c - 73 80 C0 4A C0 F3 40 D9 90 08 A9 3E 40 DB 89 34 - @..4">s..J..@....>@..4
08be49ac - 30 E2 09 04 9E C0 CC A8 23 64 66 A3 93 4C 18 A0 - 0.......#df..L..
08be49bc - 4E 06 CA 26 AF 0A E5 8B 3D AD 2D 31 E8 05 6F 3F - N..&....=.-1..o?
08be49cc - 70 32 1E C7 48 FD BD F9 64 71 51 71 0C 62 D1 FA - p2..H...dqQq.b..
08be49dc - 7B 73 4D D1 AA BD FB 63 A0 FE 6E B9 E7 64 D1 7B - {sM....c..n..d.{
08be49ec - 57 61 D9 BF BF F1 DB E2 55 8E 20 D4 FD FE 23 45 - Wa......U. ...#E
08be49fc - BB 9D 20 A5 E6 DB E2 7D 4E 90 D2 B2 73 5B 5D 16 - .. ....}N...s[].
08be4a0c - 87 88 82 E5 E4 C9 E3 CD 55 2B 8C 94 26 2B DF BE - ........U+..&+..
08be4a1c - 57 FA A4 81 6C FB F7 AD 2A AA DA 5D 68 24 DB FE - W...l...*..]h$..
08be4a2c - FD BB 8B F6 AE CA 75 80 D4 FD AB 8A B6 3E E8 04 - ......u......>..
08be4a3c - 29 BB 8B 5E C9 71 82 94 7D C5 AF 64 EB A9 4D C6 - )..^.q..}..d..M.
08be4a4c - E8 3C E0 A2 4C B7 80 DE 21 40 5C 25 9D E4 3A AA - .<..L...!@/%..:.
08be4a5c - 71 80 19 B5 C9 1A 03 45 49 7A F2 04 99 4B 05 2E - q......EIz...K..
08be4a6c - 7A FD C4 2E F6 7A FD 1A 2F 68 23 3F AD 22 D7 CE - z....z../h#?."..
host0:/> memdump 0x09FFDE60
- 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f – 0123456789abcdef
----------------------------------------------------------------------------
9ffde60 - 03 00 07 81 05 FF 08 F1 04 01 08 53 07 0D 09 BB - ...........S....
09ffde70 - 03 00 08 21 06 0C 09 4F 05 04 08 89 07 22 0A 48 - ...!...O.....".H
09ffde80 - 03 00 08 16 06 03 09 28 04 01 08 71 07 14 09 E5 - .......(...q....
09ffde90 - 03 00 08 3A 07 02 09 78 05 08 08 C0 07 32 0A A6 - ...:...x.....2..
09ffdea0 - 03 00 08 06 05 FF 08 F7 04 01 08 61 07 11 09 D3 - ...........a....
09ffdeb0 - 03 00 08 2D 06 10 09 66 05 04 08 A1 07 2A 0A 7F - ...-...f.....*..
09ffdec0 - 03 00 08 1B 06 07 09 39 04 01 08 7A 07 18 09 F2 - .......9...z....
09ffded0 - 03 00 08 47 07 09 09 97 05 08 08 DF 07 41 0A CE - ...G.........A..
09ffdee0 - 03 00 07 F0 05 FF 08 F5 04 01 08 56 07 0F 09 CB - ...........V....
09ffdef0 - 03 00 08 25 06 0C 09 5D 05 04 08 93 07 26 0A 5C - ...%...].....&./
09ffdf00 - 03 00 08 19 06 03 09 33 04 01 08 74 07 15 09 EB - .......3...t....
09ffdf10 - 03 00 08 43 07 05 09 87 05 08 08 C3 07 36 0A B5 - ...C.........6..
09ffdf20 - 03 00 08 0E 05 FF 08 FD 04 01 08 67 07 13 09 DB - ...........g....
09ffdf30 - 03 00 08 31 06 10 09 70 05 04 08 B1 07 2E 0A 94 - ...1...p........
09ffdf40 - 03 00 08 1D 06 07 09 42 04 01 08 83 07 20 09 FC - .......B..... ..
09ffdf50 - 03 00 08 50 07 0B 09 AA 05 08 08 E3 07 80 FF FF - ...P............
host0:/> savemem 0x08800000 25165824 host0:/leaveSub_0882AC0C.bin
host0:/>

再根据caller调用时参数的传递和返回情况。确定函数原型

u8* sub_0882AC0C(u8* src, u8* dst);

反编译过程

编写模拟代码
   1:  u32 sub_0882AC0C(u8* src, u8* arg1)
   2:  {
   3:      t2 = *src;
   4:      t8 = 0;
   5:      t7 = 1;
   6:      src++;
   7:      if(t2>0)
   8:      {
   9:          t1 = t7;
  10:  loc_882ac24:
  11:          v0 = *src;
  12:          src++;
  13:          t6 = v0 - 1;
  14:          if(v0>0)
  15:          {
  16:              t0 = t1 << t7;
  17:              v1 = t0 <<1;
  18:              a3 = t7 -1;
  19:  loc_882ac40:
  20:              t4 = t8;
  21:              t5 = 0;
  22:              t3 = a3;
  23:              if(t7>0)
  24:              {
  25:  loc_882ac50:
  26:                  v0 = t4 & 1;
  27:                  a2 = t5 << 1;
  28:                  t5 = a2 | v0;
  29:                  v0 = t3;
  30:                  t4 = t4 >> 1;
  31:                  t3 --;
  32:                  if(v0 > 0)
  33:                  {
  34:                      goto loc_882ac50;
  35:                  }    
  36:              }
  37:  loc_882ac6c:
  38:              a2 = *src;
  39:              at = 1:0 ? (t5<0x400)
  40:              src++;
  41:              if(at !=0 )
  42:              {
  43:                  v0 = t5 << 1;
  44:                  v0 = arg1 + v0;
  45:  loc_882ac84:
  46:                  *(u8 *)v0 = t7;
  47:                  t5 += t0;
  48:                  *(u8 *)(v0+1) = a2;
  49:                  at = 1:0 ? (t5 < 0x400)
  50:                  v0 += v1;
  51:                  if(at != 0)
  52:                  {
  53:                      goto loc_882ac84;
  54:                  }
  55:              }
  56:  loc_882ac9c:
  57:              at = 1:0 ? (t7 < 0xb);
  58:              if(at != 0)
  59:              {
  60:                  t8++;
  61:                  v0 = t6;
  62:              }
  63:              else
  64:              {
  65:                  v0 = t6;
  66:              }
  67:              t6 --;
  68:              if(v0 > 0)
  69:              {
  70:                  goto loc_882ac40;
  71:              }
  72:          }
  73:  loc_882acb8:
  74:          t7 ++;
  75:          at = 1:0 ? (t2 < t7);
  76:          t8 = t8 << 1;
  77:          if(at == 0)
  78:          {
  79:              goto loc_882ac24;
  80:          }
  81:      }
  82:      v0 = src & 1;
  83:      if(v0 == 0)
  84:      {
  85:          v0 = src;
  86:      }
  87:      else
  88:      {
  89:          src++;
  90:          v0 = src;
  91:      }
  92:      return v0;
  93:  }
.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }
整理以及验证

经过整理获得下面代码

   1:  u8* restore_table(u8* src, u8* dst)
   2:  {
   3:      u32 cnt, mask, cnt1, data, p, weight;
   4:      int ii, i, index;
   5:      u8 character;
   6:      
   7:      cnt = *src++;
   8:      mask = 0;
   9:      
  10:      if(cnt>0)
  11:      {
  12:          for(index = 1; index<=cnt; index++)
  13:          {
  14:              cnt1 = *src++;
  15:              if(cnt1>0)
  16:              {
  17:                  weight = 1 << index;
  18:                  //v1 = t0 <<1; /*unused opcode ? */
  19:                  for(ii=0; ii<cnt1; ii++)
  20:                  {
  21:                      data = mask;
  22:                      p = 0;
  23:                      if(index>0)
  24:                      {
  25:                          for(i=0; i<index; i++)
  26:                          {
  27:                              p = (p<<1) | (data &1);
  28:                              data = data >> 1;
  29:                          }
  30:                      }
  31:                      character = *src++;
  32:                      if(p<0x400)
  33:                      {
  34:                          while(p<0x400)
  35:                          {
  36:                              dst[p*2 + 0] = index;
  37:                              dst[p*2 + 1] = character;
  38:                              p+=weight;
  39:                              //v0+=v1; /* unused opcode ? */
  40:                          }
  41:                      }
  42:                      if(index < 0xb)
  43:                      {
  44:                          mask ++;
  45:                      }
  46:                  }
  47:              }
  48:              mask = mask << 1;
  49:          }
  50:      }
  51:      if((u32)src&1)
  52:      {
  53:          return src+1;
  54:      }
  55:      else
  56:      {
  57:          return src;
  58:      }
  59:  }

通过和lzss相同的方法对上面的代码进行验证,发现有下面几个不一样的地方:

memory check failed offset[0x4fe] dst[0xff]!=src[0x82]
memory check failed offset[0x4ff] dst[0xff]!=src[0x8]
memory check failed offset[0x5fe] dst[0xff]!=src[0x0]
memory check failed offset[0x5ff] dst[0xff]!=src[0x0]
memory check failed offset[0x6fe] dst[0xff]!=src[0xaf]
memory check failed offset[0x6ff] dst[0xff]!=src[0x8]
memory check failed offset[0x7fe] dst[0xff]!=src[0x82]
memory check failed offset[0x7ff] dst[0xff]!=src[0x8]

这就需要再仔细分析汇编代码,必要的时候通过PSPLink在PSP上进行验证。

对于有返回值的函数,同时也要验证返回值是否一致。

ret = src + 0x10C

因为函数:u32 sub_0882AC0C(u8* src, u8* arg1)只是从压缩数据的头部将字典恢复出来,所以我们还需要分析他的调用者,看看是他是如何通过这个字典进行解码的。于是来到函数sub_0882ACE4。

编写主体部分模拟代码
   1:  void sub_0882ACE4(u8* dst, u8* src, u32 dst_len)
   2:  {
   3:      u8 table[TABLE_LEN];
   4:      u8* p;
   5:   
   6:      s0 = dst;
   7:      a0 = src;
   8:      s1 = len;
   9:      a1 = table;
  10:      
  11:      a1  = restore_table(a0, a1);
  12:      a2 = s0 + s1;
  13:      at = 1:0 ? (s0<a2);
  14:      a1 = v0;
  15:      a3 = 0;
  16:      t0 = 0;
  17:      if(at == 0)
  18:      {
  19:          return;
  20:      }
  21:      
  22:      at = 1:0 ? (t0<0x10);
  23:  loc_882ad24:
  24:      if(at !=0)
  25:      {
  26:          v1 = *(u16 *)a1;
  27:          v1 = v1 << t0;
  28:          a3 = a3 | v1;
  29:          a1+=2;
  30:          t0+=0x10;
  31:          v1 = a3 & 0xffff
  32:      }
  33:      else
  34:      {
  35:          v1 = a3 & 0xffff;
  36:      }
  37:  loc_882ad44:
  38:      v1 = v1 & 0x3ff;
  39:      v1 = v1 <<1;
  40:      v1 = v1+sp;
  41:      a0 = v1 + 0x10;
  42:      v1 = *(u8*)a0;
  43:      at = 1: 0? (v1 < 0xb);
  44:      if(at==0)
  45:      {
  46:          t0 = t0 - 0xa;
  47:          at = 1:0 ? (t0<0x10);
  48:          a3 = a3 >> 0xa;
  49:          if(at != 0)
  50:          {
  51:              v1 = *(u16*)a1;
  52:              v1 = v1 << t0;
  53:              a3 = a3 | v1;
  54:              a1 += 2;
  55:              t0 += 0x10;
  56:          }
  57:          v1 = a3 & 0xffff;
  58:          *(u8*)s0 = v1;
  59:          s0++;
  60:          a3 = a3 >> 8;
  61:          t0 = t0 - 8;
  62:      }
  63:      else
  64:      {
  65:          v1 = *(a0+1);
  66:          *(u8*)s0 = v1;
  67:          v1 = *(a0);
  68:          s0++;
  69:          a3 = a3 >> v1;
  70:          t0 = t0 -v1;
  71:      }
  72:  loc_882adb4:
  73:      v1 = 1:0 ?(s0<a2);
  74:      if(v1 != 0)
  75:      {
  76:          goto loc_882ad24:
  77:      }
  78:      return;
  79:  }
.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }
整理以及验证

经过整理过的代码:

   1:  void huffman_decoder(u8* dst, u8* src, u32 dst_len)
   2:  {
   3:      u8 table[2048];
   4:      u8 character, *end;
   5:      u16 code, clen, nbits;
   6:      u32 cache;
   7:      
   8:      end = dst + dst_len;
   9:   
  10:      memset(table, 0xff, 2048);
  11:      src  = restore_table(src, table);
  12:      
  13:      if(dst > end)
  14:      {
  15:          return;
  16:      }
  17:   
  18:      cache = 0;
  19:      nbits = 0;    
  20:      
  21:      while(dst < end)
  22:      {
  23:          if(nbits<16)
  24:          {
  25:              cache |= (*(u16*)src)<<nbits;
  26:              src+=2;
  27:              nbits+=16;
  28:          }
  29:          code = (cache & 0xffff) & 0x3ff;
  30:          clen = table[code*2 + 0]; 
  31:          character = table[code*2 + 1];
  32:          
  33:          if(clen > 0xb)
  34:          {
  35:              nbits -= 0xa;
  36:              cache = cache >> 10;
  37:              if(nbits<16)
  38:              {
  39:                  cache |= (*(u16*)src)<<nbits;
  40:                  src += 2;
  41:                  nbits += 16;
  42:              }
  43:              *dst++ =  (u8)(cache & 0xffff);
  44:              cache >>= 8;
  45:              nbits -= 8;
  46:          }
  47:          else
  48:          {
  49:              *dst++ = character;
  50:              cache >>= clen);
  51:              nbits -= clen);
  52:          }
  53:      }
  54:      return;
  55:  }
.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }

通过在PSPLink上的跟踪可以发现huffman_decoder返回后,在寄存器a0所对应的地址上出现的仍为压缩过的数据(最简单的判别方法是解压缩出来的数据的长度小于xb文件头中所记录的长度,即文件的原始长度),接下来又会进入lzss_decoder函数的领空,从lzss_decoder返回后,寄存器a0对应的地址上出现了最终的数据(长度和xb文件头中描述的一致)。

所以0x00为huffman_decoder + lzss_decoder 的压缩方式。(对应的编码过程是 lzss_encoder + Huffman_encoder)

算法总结

Huffman是一种变长编码的查表算法,每个字符根据其出现的频率获得对应的编码值。Huffman编码被认为是最优编码是因为可以证明huffman二叉树(将出现的字符作为2叉树上的每个节点;用其编码值作为其在2叉树上位置;将其出现的频率作为权值)是最优二叉树,是一种带权路径长度最短的二叉树。

下面的URL中对huffman编码进行描述

http://blog.csdn.net/xx_snoopy/archive/2009/11/23/4856652.aspx

代码实现我参考了下面的开源项目

http://sourceforge.net/projects/huffman/

该变种中,第一个变化点在于引入了一个编码截断长度的概念:

当某个字符对应的编码超过了11个bits的时候,就不对该字符进行编码了,记录时采用固定的识别码+该字符的方式。

第二个变化点在于他对码表增加了一个排序的动作(见函数restore_table)。这样保持和恢复码表的时候都变简洁。

未知算法0x10(huffman)

分析的过程和0x00基本一致,只是后面少了对lzss_decoder的调用。

结束语

先将这几种算法再列举一遍:
0x00 - 未知压缩算法,在压缩头和压缩数据之间,存在一个类似字典的数据块,并且不计入压缩数据大小;(Huffman + Lzss)
0x10 - 未知压缩算法,类似或等同于0x00,不能确定;(Huffman)
0x20 - lzss压缩或者相关变种,这种压缩破起来倒是很容易;(lzss)
0x30 - 未压缩,无压缩头,直接就是raw数据。(NONE)

学习是一个理论和实践不停相互强化的过程,任何真知灼见来自于自己的实践所得,所以不要轻信他人的观点;学习需要大家的相互交流,所以欢迎大家和我联系,所谓学无先后,达者为师;如果你有不同的观点,也欢迎你告诉我。

以上几句引自《论语·学而》
作为一个优秀的程序员,除了要有客观的态度,还需要淡定的心态。在这方面我还有很长的修炼之路。谨以此和大家共勉吧。

原创粉丝点击