Apache Avro Java手册

来源:互联网 发布:可以赚钱的软件 编辑:程序博客网 时间:2024/06/06 03:48

Defining a schema

    Avro schema使用Json定义。schema由原始类型(null,boolean,int,long,float,double,byte和string)和复杂类型(record,enum,array,map,union,fixed)组成。
{"namespace": "example.avro", "type": "record", "name": "User", "fields": [     {"name": "name", "type": "string"},     {"name": "favorite_number",  "type": ["int", "null"]},     {"name": "favorite_color", "type": ["string", "null"]} ]}
    schema定义了一个代表user的record。一个record的最小定义必须包含类型("type":"record"),名称("name":"user")和fields。我们也可以定义命名空间("namespace":"example.avro"),它将与name属性一起使用构成全名(example.avro.User)。
    Fileds定义为对象数组,其中每个定义了name和type。

Serializing and deserializing with code generation

Compiling the schema

    Code generation允许我们自动创建基于schema的类。一旦我们定义了相关你的类,在程序中就没有必要直接使用schema。
    java -jar /path/to/avro-tools-1.7.3.jar compile schema <schema file> <destination>

Creating Users

    代码生成后,使用以下代码demo来创建user。
User user1 = new User();user1.setName("Alyssa");user1.setFavoriteNumber(256);// Leave favorite color null// Alternate constructorUser user2 = new User("Ben", 7, "red");// Construct via builderUser user3 = User.newBuilder()             .setName("Charlie")             .setFavoriteColor("blue")             .setFavoriteNumber(null)             .build();

Serializing

// Serialize user1 and user2 to diskFile file = new File("users.avro");DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);dataFileWriter.create(user1.getSchema(), new File("users.avro"));dataFileWriter.append(user1);dataFileWriter.append(user2);dataFileWriter.append(user3);dataFileWriter.close();

    DatumWriter将Java对象转换内存中序列化格式,SpecificDatumWriter与生成的class使用,从特定的生成类型中抽取schema。DataFileWriter写入序列化records和schema。

Deserializing

// Deserialize Users from diskDatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class);DataFileReader<User> dataFileReader = new DataFileReader<User>(file, userDatumReader);User user = null;while (dataFileReader.hasNext()) {// Reuse user object by passing it to next(). This saves us from// allocating and garbage collecting many objects for files with// many items.user = dataFileReader.next(user);System.out.println(user);}
    SpecificDatumReader转换内存中序列化items为生成class的实例。DataFileReader读取磁盘上的文件。将user对象传递给next方法,重用user对象。

Serializing and deserializing without code generation

Creating users

Schema schema = new Parser().parse(new File("user.avsc"));GenericRecord user1 = new GenericData.Record(schema);user1.put("name", "Alyssa");user1.put("favorite_number", 256);// Leave favorite color nullGenericRecord user2 = new GenericData.Record(schema);user2.put("name", "Ben");user2.put("favorite_number", 7);user2.put("favorite_color", "red");          

由于没有使用code generation,使用GenericRecord替代user。GenericRecord使用schema来验证有效的field。如果我们设置不存在的field,如user1.put("favorite_animal","cat"),会跑抛出异常。

Serializing

// Serialize user1 and user2 to diskFile file = new File("users.avro");DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter);dataFileWriter.create(schema, file);dataFileWriter.append(user1);dataFileWriter.append(user2);dataFileWriter.close();

Deserializing

// Deserialize users from diskDatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, datumReader);GenericRecord user = null;while (dataFileReader.hasNext()) {// Reuse user object by passing it to next(). This saves us from// allocating and garbage collecting many objects for files with// many items.user = dataFileReader.next(user);System.out.println(user);