protobuf 使用相关

来源：互联网发布：阿列克谢耶维奇知乎编辑：程序博客网时间：2024/06/05 20:43

google protobuf的介绍和使用官方地址为：https://developers.google.com/protocol-buffers/

先总结一下如何在eclipse中轻松的使用protobuf.

首先需要安装一个插件叫protobuf-dt,介绍及安装说明：https://code.google.com/p/protobuf-dt/

These instructions assume that you have already installed some flavor of Eclipse 3.7 or 3.8. If you have not, Eclipse can be downloaded fromhttp://download.eclipse.org/eclipse/downloads/

Once you have Eclipse up and running, do the following

install Xtext 2.3.0 from the update site http://download.eclipse.org/modeling/tmf/xtext/updates/composite/releases/
install protobuf-dt from the update site http://protobuf-dt.googlecode.com/git/update-site

我用的是eclipse juno,安装过程中遇到了一些问题，在https://code.google.com/p/protobuf-dt/wiki/Installing下面的留言中找到了解决方案。

总结一下就是：

1 安装顺序不能乱，先安装xtext,再安装protobuf-dt,

2 xtext插件只安装xtext ui组件，其他的不要安装，不然后面安装protobuf-dt的时候会有依赖冲突。

3 protobuf-dt和xtext插件的版本有关系，protobuf-dt的最新版本依赖的是xtext2.4.2版本，因此安装时记得选对版本，官方安装说明里那个已经过时了是很早以前的安装说明。

安装好后，就可以在eclipse里非常方便的编写自己的.proto文件了。file-new中也会有新建.proto文件。

新建一个maven项目，添加依赖

<dependency><groupId>com.google.protobuf</groupId><artifactId>protobuf-java</artifactId><version>2.5.0</version></dependency>

在项目的properties中编辑protobuf插件的选项，如下图：

1 main选项中设置编译.proto文件的protoc.exe （下载地址：https://code.google.com/p/protobuf/downloads/list，在WIN下用我下载的是最新的protoc-2.5.0-win32.zip），这样每次编辑完.proto文件后保存的时候会自动重新生成新的java文件。

2 由于是maven项目，我把生成代码的路径改为src/main/java(默认是src-gen)。

项目目录如下：

addressbook.proto是从官方摘下来的：

package tutorial;option java_package = "com.example.tutorial";option java_outer_classname = "AddressBookProtos";message Person {optional string name = 1;required int32 id = 2;optional string email = 3;enum PhoneType {MOBILE = 0;HOME = 1;WORK = 2;}message PhoneNumber {required string number = 1;optional PhoneType type = 2 [default = HOME];}repeated PhoneNumber phone = 4;}message AddressBook {repeated Person person = 1;}

具体的语法描述参见官方的详细说明。

下面是对官方文档中有关protobuf序列化原理的一些理解，原地址：https://developers.google.com/protocol-buffers/docs/encoding.

官方说

message Test1 {  required int32 a = 1;}

当给a设值为150的时候，会得到三个字节的数组：

08 96 01

第一个字节的官方说明比较好理解，08换成二进制为：00001000,最低的三位（000）用来表示wire_type=0,然后right shift3位变成00000001,表示tag=1(tag即为a=1中的1)。后面两个字节怎么换算成150的其实说的也很详细。但是我在JAVA里测试了一下，拿到的三个字节竟然是：[8, -106, 1]，第二个字节不是96而是一个负值！这是什么情况。。。

于是baidu查找了一下java 中byte的解释,byte表示8位，值在-128~127之间，负值用补码（原码取反+1）形式表示，正数直接用原码表示，最高位为符号位。

于是-106的二进制表示可以这样得到：106的原码取反+1. 106原码为：01101010，取反后为：10010101，加1后为：10010110，即-106在JAVA中的二进制表示为10010110，转换成16进制后为96！！，没错，就是96！和官方的是一致的。

再来看看96 01两个字节怎么表示成150的。

96 01 =》 10010110 00000001 去掉每个字节的最高位（官方解释为msb,Each byte in a varint, except the last byte, has the most significant bit (msb) set ）,余下的每个字节的低7位反转,即 0010110 0000001反转(即调个位置)=》10010110 ,转换成10进制正好为150.

官方关于strings的序列化也很好理解，这里不说明了。

说说关于embedded messages的序列化：

message Test3 {  required Test1 c = 3;}

Test1是我们前面解释的varint时定义的那个Test1,我们仍然把Test1的a值设为150，此时得到的byte数组为:

1a 03 08 96 01

As you can see, the last three bytes are exactly the same as our first example (08 96 01), and they're preceded by the number 3 – embedded messages are treated in exactly the same way as strings (wire type = 2).官方的解释说嵌套消息的序列化按照strings序列化来对待. 第一个字节1a转化成二进制位：00011010,低三位为010表示wire-type=2,对照wire type表为

2Length-delimitedstring, bytes, embedded messages, packed repeated fields2Length-delimitedstring, bytes, embedded messages, packed repeated fields

，然后right shift 3,变为00000011，表示tag=3,没错，我们Test3中定义的c的field number确实为3. 第二个字节03表示后面跟的data value的长度3个字节。后面三个字节08 96 01就表示的是Test1的150了这里不解释了。

为啥想要去了解一下protobuf,因为我发现编译hadoop2.2的时候，hadoop 的rpc好像已经开始使用protobuf来作为序列化框架来传输数据了，为了性能把，网上有比对过protobuf和其他一些例如thrift序列化框架的对比，protobuf的性能明显高于它们，google内部也是使用protobuf。hbase好像也使用了protobuf进行rpc通信了。

后续研究一下protobuf-socket-rpc用法。再补充

2Length-delimitedstring, bytes, embedded messages, packed repeated fields

0 0