从luajit bytecode dump文件提取字符串资源

来源：互联网发布：淘宝卖的aj哪里来的编辑：程序博客网时间：2024/05/18 13:06

当字符串长度小于128时这样子表述是没有问题的。

struct str{    unsigned char length;    char str[0]};

事实上，长度信息这个整数应该是使用ULEB128编码方式编码的，理由是LuaJIT 2.0 Bytecode Dump Format里使用了这种格式：

LuaJIT 2.0 Bytecode Dump FormatDetails for the bytecode dump format can be found in src/lj_bcdump.h in the LuaJIT source code. Here's the concise format description:dump   = header proto+ 0Uheader = ESC 'L' 'J' versionB flagsU [namelenU nameB*]proto  = lengthU pdatapdata  = phead bcinsW* uvdataH* kgc* knum* [debugB*]phead  = flagsB numparamsB framesizeB numuvB numkgcU numknU numbcU         [debuglenU [firstlineU numlineU]]kgc    = kgctypeU { ktab | (loU hiU) | (rloU rhiU iloU ihiU) | strB* }knum   = intU0 | (loU1 hiU)ktab   = narrayU nhashU karray* khash*karray = ktabkkhash  = ktabk ktabkktabk  = ktabtypeU { intU | (loU hiU) | strB* }B = 8 bit, H = 16 bit, W = 32 bit, U = ULEB128 of W, U0/U1 = ULEB128 of W+1TODO: turn the description into human-readable text :-)

提取代码如下：

filelist = ['app.TiConfig.BiaoBai', 'app.TiConfig.ChuanYi', 'app.TiConfig.GouTong', 'app.TiConfig.GouWu', 'app.TiConfig.JiaWu', 'app.TiConfig.JiuCan', 'app.TiConfig.KanBing', 'app.TiConfig.LvXing', 'app.TiConfig.MuYu', 'app.TiConfig.YueHui']for fname in filelist:    with open(fname, 'rb') as fin:        with open(fname+'.txt', 'wt', encoding='utf-8') as fout:            buffer = fin.read()            buffer = buffer[::-1] # reverse            firstNullIdx = buffer.index(b'\x00')            secondNullIdx = buffer.index(b'\x00', firstNullIdx + 1)            buffer = buffer[0:secondNullIdx]            buffer = buffer[::-1] # reverse            offset = 0            lst = []            while True:                length = buffer[offset] - 5 # 与实际长度相差5                offset = offset + 1                slice = buffer[offset:offset+length]                 ## 字符串已取出，                ## 下面的代码是针对该案例的进一步处理。                slice = slice.decode('utf-8')                if slice == 'ti':                    break                if len(slice) > 8:                    fout.write(slice+'\n')                    fout.writelines(lst)                    fout.write('\n')                    lst = []                else:                    lst.append(slice+'\n')                offset = offset + length

参考资料

Little Endian Base 128, wikipedia, http://en.wikipedia.org/wiki/LEB128

LuaJIT 2.0 Bytecode Dump Format, http://wiki.luajit.org/Bytecode-2.0#LuaJIT-2.0-Bytecode-Dump-Format

0 0