.osr 文件格式解析（二） - 数据类型

来源：互联网发布：晚安网络用语编辑：程序博客网时间：2024/06/02 06:02

吐槽一句，PPY很喜欢用文本格式来储存数据，这明明是即不省时间又不省空间的做法….也不知道他怎么想的

先来看一下，osr里出现得数据类型除了Long、ULEB128和String其他的都是普通的类型

Name Bytes Description Byte 1 A single 8 bit value. Short 2 A 2-byte little endian value. Integer 4 A 4-byte little endian value. Long 8 A 8-byte little endian value. ULEB128 Variable A variable length integer. See wikipedia:ULEB128. String Variable Has three parts; a single byte which will be either 0x00, indicating that the next two parts are not present, or 0x0b (decimal 11), indicating that the next two parts are present. If it is 0x0b, there will then be a ULEB128, representing the byte length of the following string, and then the string itself, encoded in UTF-8. See wikipedia:UTF-8

Long 在这里是8字节的，所以直接声明long是不行的（c/c++里long默认是4字节的），要用__int64(long long)
ULEB128 LEB128是一种变长整数，加个U就是无符号的变长整数，其本质是unsigned char数组。它的字节长度不定（但是一般不超过5字节）。数据格式为每字节的8bit中，编码的每个字节有效部分只有低7bits，每个字节的最高bit用来指示有无下一字节。具体看下面的详细说明。
String 这里的string不是通常的string，是一个结构体。由3部分组成，具体看下面的详细说明。

ULEB128 与 unsigned int 互转

unsigned int 转 ULEB128：

    void EncodeULEB128(unsigned int value, unsigned char *leb128_buffer)    {        int pos = 0;        while (value != 0) {            leb128_buffer[pos++] = value & 0x7F | 0x80; //每个字节标识信息都设为1            value >>= 7;        }        if (pos > 0)            leb128_buffer[pos - 1] &= 0x7F;  //将最后一个字节的标识信息设为0    }

ULEB128 转 unsigned int：

    unsigned int DecodeULEB128(unsigned char *leb128_buffer)    {        unsigned int value = 0;        int pos = 0;        int offset = 0;        while (leb128_buffer[pos] != 0) {            value |= ((leb128_buffer[pos] & 0x7F) << offset); //从低到高将 bits 合并到一起            offset += 7;            if (leb128_buffer[pos] & 0x80 == 0)                break;            pos += 1;        }        return value;    }

osrString

根据官方文档的描述，我们可以得到如下结构体:

    struct String    {        unsigned char Type; //状态标签（如果其值为0x00，则没有下面两部分；若值为0x0B反之）        ULEB128 StrLengeh;  //str的字节长度        string str;         //文本字符串    }

但是这样的结构不适合我们操作（因为ULEB128是变长啊，该死的ppy，就为了省那几字节搞得这么麻烦，用int会死啊，你的字符串长度会int溢出?）

所以我写了个osrString类封装了这个结构，声明如下

    class osrString    {    private:        unsigned int StrLengeh;        unsigned char Type;        std::string str;    public:        osrString();        ~osrString();        void Load(std::ifstream &fin);      //从文件流读入数据        void Save(std::ofstream &fout);     //通过文件流输出数据        void SetString(std::string str);    //改变string的内容        std::string GetString();            //取得文本        void Clear();                       //情况内容，设置Type=0x00    };

方法实现：

    osrString::osrString()    {        Clear();    }    osrString::~osrString(){}    void osrString::Clear()    {        this->Type = 0x00;        this->StrLengeh = 0;        this->str = "";    }    void osrString::Load(std::ifstream &fin)    {        fin.read((char*)&Type, sizeof(Type));        if (Type == 0x00)        {            Clear();        }        else if (Type == 0x0B)        {            //读取StrLengeh（ULEB128）            unsigned int value=0;            byte buff;            fin.read((char*)&buff, sizeof(buff));            int offset = 0;            while (buff != 0) {                value |= ((buff & 0x7F) << offset); //从低到高将 bits 合并到一起                offset += 7;                if ((buff & 0x80) == 0)                    break;                fin.read((char*)&buff, sizeof(buff));            }            StrLengeh = value;            //读取str            char *strbuff = new char[StrLengeh + 1];            ZeroMemory(strbuff, StrLengeh + 1);            fin.read(strbuff, StrLengeh);            str = strbuff;            delete[] strbuff;            str = UTF8_To_string(str);        }    }    void osrString::SetString(std::string str)    {        if (!str.empty())        {            this->Type = 0x0B;            this->str = str;        }        else{            this->Type = 0x00;            this->str = "";        }    }    std::string osrString::GetString()    {        return this->str;    }    void osrString::Save(std::ofstream &fout)    {        fout.write((char*)&Type, sizeof(Type));        if (Type == 0x00)        {            return;        }        else if (Type == 0x0B)        {            std::string utf8_str = string_To_UTF8(str);            StrLengeh = utf8_str.size();            //写入StrLengeh（ULEB128）            unsigned int value = StrLengeh;            byte buff[5] = { 0 };            int pos = 0;            while (value != 0) {                buff[pos++] = value & 0x7F | 0x80; //每个字节标识信息都设为1                value >>= 7;            }            if (pos > 0)                buff[pos - 1] &= 0x7F;  //将最后一个字节的标识信息设为0            fout.write((char*)buff, pos);            //写入str            fout.write((char*)utf8_str.c_str(), StrLengeh);        }    }

好了，至此读取osr要用到的类我们都准备好了，下面就可以开始写读取的方法了。

0 0