Google protobuf开发者文档笔记

来源:互联网 发布:java 获取运行时路径 编辑:程序博客网 时间:2024/04/28 20:30

开发者文档笔记

  1. protobuf向后兼容,增加新字段不影响旧的代码的解析过程,旧代码会忽略它
  2. 每个字段后面的数字(tag)用来在序列化后的二进制中标识这个字段。在1-15的tag占用1个字节(wiretype占3位,最高有效位被占用),所以常用字段要设置较小的tag
  3. 谨慎使用required,使用requied但是没有传值的话,可能message会被拒绝或扔掉,所以应该在应用层代码中自己校验完整性而不是使用required。
  4. Reserved field:当删除.proto文件中的某字段时,这个字段的tag number 又能被继续使用,但是当其他用户引入了旧版本的.proto文件,就可能引起tag number冲突,所以推荐在删除某字段时,把该field的tag number和name都设置为reserved field,这样别人想再次使用时,编译器会报错。
  5. Protocol buffer编译器会根据.proto文件生成get or set field value的函数,从一个输入流中parse 得到message 或把message序列化到输出流的函数。
  6. 更新一个消息类型的准则.
    • 不要改变已经存在的域的tag
    • 任何添加的新字段应该是optional或者repeated类型.
    • Non-Required字段可以被去掉,只要tag number不要被reuse
    • int32, uint32, int64, uint64, and bool are all compatible
    • sint32 and sint64 are compatible
    • Embedded Message和bytes可以转换
  7. import definitions
    当需要其它proto文件中的定义时,默认情况下import该文件,但不支持递归import。当把一个proto文件移到另一个位置时,不需要修改每一处对该proto文件的import,只需要在旧proto里用import public 指向新的位置,这样对旧proto的import就能依赖到新的proto。import文件的搜索路径由–proto_path标志指定

信息编码格式

基于128位varints编码

  1. Varints是一种用一个或更多字节序列化整数的方式
  2. Varints中的每个字节的最高有效位用来代表是否这个整数还有后续的字节,所以除了最后一个字节外,所有的字节的最高有效位都被设置。
  3. 从一个varints编码得到原始整数的方法:
    比如300的编码为1010 1100 0000 0010
    两个字节分别为 10101100 00000010
    都去掉最高有效位 0101100 0000010
    因为varints是把最低有效的字节组(去掉最高有效位后的7个字节为一个字节组,用来表示整数的值)排在前面,所以这里需要翻转一下,变为:
    00000100101100, 即300.

消息结构

  1. 消息编码的时候,字段的field number(在.proto文件定义)和wire type(提供接下来的值所占长度的信息,比如wire type为0,代表这个值是int32类型,占4个字节,具体wire type的种类见https://developers.google.com/protocol-buffers/docs/encoding#structure)作为key,字段的值作为value,key+value一起存在编码后的数据中。Key是一个varints,按(filednumber<<3|wire type)的方式存储,也就是说,key的最低三位就是wire type。
  2. 有符号整数:当值为负数时,有符号整数(sint32,sin64)和int32,int64不同,当使用int32和int64时,编码后的varints为10字节长,它被对待成一个很大的无符号整数。当使用sint32和sint64时,结果varints使用zigzag编码,这更有效率。
  3. zigzag编码把有符号整数映射为无符号整数,使得有较小绝对值的数有较短的varints编码。
    zigzag编码,每个n被映射为(n<<1)^(n>>31),这里是算术右移(右移时填充符号位),所以对负数而言,(n>>31)时全1,对正数而言时全0.
  4. 非varint数字类型:double和fixed64为64位,float和fixed32为32位
  5. String类型:value为字符串的长度后面跟着实际的字符串数据
    嵌套消息:被当作和string一样来看待,tag+wire type 后面跟这个嵌套消息的字节数,然后跟嵌套消息的编码
  6. [packed = true]选项:使repeated字段更有效的编码.

编译器生成c++代码

  1. 使用‘—cpp_out=dir’参数调编译器,编译器会在dir生成c++代码。
  2. 编译器会把.proto替换成.pb.h和.pb.c,比如:
    protoc –proto_path=src –cpp_out=build/gen src/foo.proto src/bar/baz.proto
    编译器会生成build/gen/foo.pb.h,build/gen/foo.pb.cc,build/gen/bar/foo.pb.h,build/gen/bar/foo.pb.cc这四个文件,编译器会自动创建/build/gen/bar目录,但不会自己创造/build回/build/gen这两个目录。
  3. package声明:如果.proto文件包含有package声明,文件的整个内容都会被放在相应的C++的命名空间中。
  4. 对于一个message,编译器会生成一个相应的具象类,继承google:protobuf::Message类。这个类没有未实现的纯虚函数。Message类的非纯虚函数可能不会被该具象类实现,取决于优化模式。默认情况下,该具象类为了最大的速度实现了所有方法的特定版本。
    如果.proto文件包含了这一行:option optimize_for = CODE_SIZE,该具象类会实现最小数量的必要的函数,其余的函数由基于反射来实现。这显著减少了生成代码的大小。
    如果.proto文件包含:option optimize_for = LIFE_RUNTIME,具象类会包含所有方法的快速实现,但是实现的是google::protobuf::MessageLite接口,只包含Message方法的一个子集。尤其是,他不支持反射和描述符。然而,在这个模式下,生成的代码只需要连接libprotobuf-lite.so,而不是libprotobuf.so,这个lite库要小的多。
  5. Message接口定义了一些接口来检查,操作message,从一个流中parse或序列化到流。除此之外,Message还定义了一些其它的方法,比如
    Foo(): Default constructor.
    ~Foo(): Default destructor.
    Foo(const Foo& other): Copy constructor.
    Foo& operator=(const Foo& other): Assignment operator.
    void Swap(Foo* other): Swap content with another message.
    const UnknownFieldSet& unknown_fields() const: Returns the set of unknown fields encountered while parsing this message.
    UnknownFieldSet* mutable_unknown_fields(): Returns a pointer to the mutable set of unknown fields encountered while parsing this message.
    static const Descriptor* descriptor(): Returns the type’s descriptor. This contains information about the type, including what fields it has and what their types are. This can be used with reflection to inspect fields programmatically.
    static const Foo& default_instance(): Returns a const singleton instance of Foo which is identical to a newly-constructed instance of Foo (so all singular fields are unset and all repeated fields are empty). Note that the default instance of a message can be used as a factory by calling its New() method.
  6. 一个Message可以在另一个Message内部定义,比如:Message Foo{Message Bar{}},在这个例子中,编译器会生成两个类:Foo和Foo_Bar,然后在Foo内生成一个typedef:
    Typedef Foo_Bar Bar。在其它件文中如果想使用这个嵌套类型,需要使用:Foo_Bar
    编译器对每个字段都会生成一个整数常量,以k开头,然后是字段的名字(驼峰命名),最后是FieldNumber,比如optional int32 foo_bar = 5,编译器会生成static const int kFooBarFieldNumber = 5。
  7. 对于某个字段的const引用或指针,在下一次访问该字段的时候可能会失效。
  8. 对于单个的整数字段(Singular Numeric Type),比如:
    optional int32 foo = 1;
    required int32 foo = 1;
    编译器会生成:
    bool has_foo() const: Returns true if the field is set.
    int32 foo() const: Returns the current value of the field. If the field is not set, returns the default value.
    void set_foo(int32 value): Sets the value of the field. After calling this, has_foo() will return true and foo() will return value.
    void clear_foo(): Clears the value of the field. After calling this, has_foo() will return false and foo() will return the default value.
  9. 对于单独的string 字段,比如:
    optional string foo = 1;
    required string foo = 1;
    optional bytes foo = 1;
    required bytes foo = 1;
    编译器会生成:bool has_foo() const: Returns true if the field is set.
    const string& foo() const: Returns the current value of the field. If the field is not set, returns the default value.
    void set_foo(const string& value): Sets the value of the field. After calling this, has_foo() will return true and foo() will return a copy of value.
    void set_foo(const char* value): Sets the value of the field using a C-style null-terminated string. After calling this, has_foo() will return true and foo() will return a copy of value.
    void set_foo(const char* value, int size): Like above, but the string size is given explicitly rather than determined by looking for a null-terminator byte.
    string* mutable_foo(): Returns a pointer to the mutable string object that stores the field’s value. If the field was not set prior to the call, then the returned string will be empty (not the default value). After calling this, has_foo() will return true and foo() will return whatever value is written into the given string.
    void clear_foo(): Clears the value of the field. After calling this, has_foo() will return false and foo() will return the default value.
    void set_allocated_foo(string* value): Sets the string object to the field and frees the previous field value if it exists. If the string pointer is not NULL, the message takes ownership of the allocated string object and has_foo() will return true. Otherwise, if the value is NULL, the behavior is the same as calling clear_foo().
    string* release_foo(): Releases the ownership of the field and returns the pointer of the string object. After calling this, caller takes the ownership of the allocated string object, has_foo() will return false, and foo() will return the default value.
  10. 重复的整数类型(Repeated Numeric Type)
    比如:repeated int32 foo = 1;
    编译器产生的accessor function为:
    int foo_size() const: Returns the number of elements currently in the field.
    int32 foo(int index) const: Returns the element at the given zero-based index. Calling this method with index outside of [0, foo_size()) yields undefined behavior.
    void set_foo(int index, int32 value): Sets the value of the element at the given zero-based index.
    void add_foo(int32 value): Appends a new element to the field with the given value.
    void clear_foo(): Removes all elements from the field. After calling this, foo_size() will return zero.
    const RepeatedField& foo() const: Returns the underlying RepeatedField that stores the field’s elements. This container class provides STL-like iterators and other methods.
    RepeatedField* mutable_foo(): Returns a pointer to the underlying mutable RepeatedField that stores the field’s elements. This container class provides STL-like iterators and other methods.
  11. Repeated string Type
    比如:
    repeated string foo = 1;
    repeated bytes foo = 1;
    编译器产生的accessor function为:
    int foo_size() const: Returns the number of elements currently in the field.
    const string& foo(int index) const: Returns the element at the given zero-based index. Calling this method with index outside of [0, foo_size()) yields undefined behavior.
    void set_foo(int index, const string& value): Sets the value of the element at the given zero-based index.
    void set_foo(int index, const char* value): Sets the value of the element at the given zero-based index using a C-style null-terminated string.
    void set_foo(int index, const char* value, int size): Like above, but the string size is given explicitly rather than determined by looking for a null-terminator byte.
    string* mutable_foo(int index): Returns a pointer to the mutable string object that stores the value of the element at the given zero-based index. Calling this method with index outside of [0, foo_size()) yields undefined behavior.
    void add_foo(const string& value): Appends a new element to the field with the given value.
    void add_foo(const char* value): Appends a new element to the field using a C-style null-terminated string.
    void add_foo(const char* value, int size): Like above, but the string size is given explicitly rather than determined by looking for a null-terminator byte.
    string* add_foo(): Adds a new empty string element and returns a pointer to it.
    void clear_foo(): Removes all elements from the field. After calling this, foo_size() will return zero.
    const RepeatedPtrField& foo() const: Returns the underlying RepeatedPtrField that stores the field’s elements. This container class provides STL-like iterators and other methods.
    RepeatedPtrField* mutable_foo(): Returns a pointer to the underlying mutable RepeatedPtrField that stores the field’s elements. This container class provides STL-like iterators and other methods.
  12. Repeated Enum Type 跟numeric一致
  13. 重复的嵌入式消息类型(Repeated embedded message type)
    比如:repeated Bar foo = 1;
    编译器产生的函数为:
    int foo_size() const: Returns the number of elements currently in the field.
    const Bar& foo(int index) const: Returns the element at the given zero-based index. Calling this method with index outside of [0, foo_size()) yields undefined behavior.
    Bar* mutable_foo(int index): Returns a pointer to the mutable Bar object that stores the value of the element at the given zero-based index. Calling this method with index outside of [0, foo_size()) yields undefined behavior.
    Bar* add_foo(): Adds a new element and returns a pointer to it. The returned Bar is mutable and will have none of its fields set (i.e. it will be identical to a newly-allocated Bar).
    void clear_foo(): Removes all elements from the field. After calling this, foo_size() will return zero.
    const RepeatedPtrField& foo() const: Returns the underlying RepeatedPtrField that stores the field’s elements. This container class provides STL-like iterators and other methods.
    RepeatedPtrField* mutable_foo(): Returns a pointer to the underlying mutable RepeatedPtrField that stores the field’s elements. This container class provides STL-like iterators and other methods.
  14. Oneof Numeric Fields
    For this oneof field definition:
    oneof oneof_name {
    int32 foo = 1;

    }
    The compiler will generate the following accessor methods:
    bool has_foo() const (proto2 only): Returns true if oneof case is kFoo.
    int32 foo() const: Returns the current value of the field if oneof case is kFoo. Otherwise, returns the default value.
    void set_foo(int32 value):
    If any other oneof field in the same oneof is set, calls clear_oneof_name().
    Sets the value of this field and sets the oneof case to kFoo.
    has_foo() (proto2 only) will return true, foo() will return value, and oneof_name_case() will return kFoo.
    void clear_foo():
    Nothing will be changed if oneof case is not kFoo.
    If oneof case is kFoo, clears the value of the field and oneof case. has_foo() (proto2 only) will return false, foo() will return the default value and oneof_name_case() will return ONEOF_NAME_NOT_SET.
  15. Map Fields

For this map field definition:

map

oneof

Oneof

Given a oneof definition like this:
oneof oneof_name {
int32 foo_int = 4;
string foo_string = 9;

}
The compiler will generate the following C++ enum type:

enum OneofNameCase {
kFooInt = 4,
kFooString = 9,
ONEOF_NAME_NOT_SET = 0
}
In addition, it will generate this method:

OneofNameCase oneof_name_case() const: Returns the enum indicating which field is set. Returns ONEOF_NAME_NOT_SET if none of them is set.
The compiler also generates the following private method, which is used in oneof field accessors:

void clear_oneof_name(): Frees the object if the oneof field set uses a pointer (Message or String), and sets the oneof case to ONEOF_NAME_NOT_SET.

解析和序列化

bool SerializeToString(string* output) const;: serializes the message and stores the bytes in the given string. Note that the bytes are binary, not text; we only use the string class as a convenient container.

bool ParseFromString(const string& data);: parses a message from the given string.

bool SerializeToOstream(ostream* output) const;: writes the message to the given C++ ostream.

bool ParseFromIstream(istream* input);: parses a message from the given C++ istream.

反射

利用反射,可以写不针对某个特定类型的message的代码,这在把message和其它的编码格式相互转换时非常有用,比如XML和JSON。

反射的API见https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.message#Message.Reflection