大数据核心技术源码分析之-Avro篇-2

来源:互联网 发布:路由器mac地址 编辑:程序博客网 时间:2024/05/23 12:03

拿到Avro-trunk下的源码,第一个分析对象就是

avro-trunk_src\lang\java下的源码

源码结构包括avro,compiler,ipc,mapred,protobuf,thrift等等

首先切入avro中

一级类集中在JsonProperties[顶级抽象类]

Schema,Protocol【继承JsonProperties】

SchemaNormalization,以及SchemaBuilder

和Exception

从中可以分析avro核心支持所谓的Json格式Schema的原因所在

从Schema中可以看出所支持的Schema类型

public enum Type {
    RECORD, ENUM, ARRAY, MAP, UNION, FIXED, STRING, BYTES,
      INT, LONG, FLOAT, DOUBLE, BOOLEAN, NULL;
    private String name;
    private Type() { this.name = this.name().toLowerCase(); }
    public String getName() { return name; }
  };

而Protocol包括两类Message

 

针对JsonProperties内置为

Map<String,JsonNode> props = new LinkedHashMap<String,JsonNode>(1);

关注两个同步方法:

public synchronized JsonNode getJsonProp(String name) {
    return props.get(name);
  }

public synchronized void addProp(String name, JsonNode value) {}

实现读写的同步控制

在Protocol中定义的Message和TwoWayMessage如下

public class Message extends JsonProperties {
    private String name;
    private String doc;
    private Schema request;

TwoWayMessage如下

private class TwoWayMessage extends Message {
    private Schema response;
    private Schema errors;

针对SchemaBuilder顾名思义为 创建对应的Schema

对应包含多种类型的Builder

还包含对应的FieldDefault系列和Completion

以及

private abstract static class Completion<R> {
    protected abstract R complete(Schema schema);
  }

针对FieldDefault的定义如下

 private static abstract class FieldDefault<R, S extends FieldDefault<R, S>> extends Completion<S> {
    private final FieldBuilder<R> field;
    private Schema schema;
    protected FieldDefault(FieldBuilder<R> field) {
      this.field = field;
    }
   
    /** Completes this field with no default value **/
    public final FieldAssembler<R> noDefault() {
      return field.completeField(schema);
    }
   
    private FieldAssembler<R> usingDefault(Object defaultVal) {
      return field.completeField(schema, defaultVal);
    }
   
    @Override
    protected final S complete(Schema schema) {
      this.schema = schema;
      return self();
    }
   
    protected abstract S self();
  }

关注最后一个方法:

 // create default value JsonNodes from objects
  private static JsonNode toJsonNode(Object o) {
    try {
      String s;
      if (o instanceof ByteBuffer) {
        // special case since GenericData.toString() is incorrect for bytes
        // note that this does not handle the case of a default value with nested bytes
        ByteBuffer bytes = ((ByteBuffer) o);
        bytes.mark();
        byte[] data = new byte[bytes.remaining()];
        bytes.get(data);
        bytes.reset(); // put the buffer back the way we got it
        s = new String(data, "ISO-8859-1");
        char[] quoted = JsonStringEncoder.getInstance().quoteAsString(s);
        s = "\"" + new String(quoted) + "\"";
      } else {
        s = GenericData.get().toString(o);
      }
      return new ObjectMapper().readTree(s);
    } catch (IOException e) {
      throw new SchemaBuilderException(e);
    }
  }

通过NIO方式将Object转换为JsonNode

对应的JsonNode为org.codehaus.jackson.JsonNode;

 

分析其它源码结构在avro下的

包括data,file,generic,io,ipc,reflect,specific,tool,util

package data:

包括

Json

包括一个Writer和Reader

RecordBuilder

public interface RecordBuilder<T> {
  T build();
}

RecordBuilderBase

public abstract class RecordBuilderBase<T extends IndexedRecord>
  implements RecordBuilder<T>

该BuilderBase提供验证的模版方法

ErrorBuilder

一个继承的Builder

public interface ErrorBuilder<T> extends RecordBuilder<T> {
 
  /** Gets the value */
  Object getValue();
 
  /** Sets the value */
  ErrorBuilder<T> setValue(Object value);
 
  /** Checks whether the value has been set */
  boolean hasValue();
 
  /** Clears the value */
  ErrorBuilder<T> clearValue();
 
  /** Gets the error cause */
  Throwable getCause();
 
  /** Sets the error cause */
  ErrorBuilder<T> setCause(Throwable cause);
 
  /** Checks whether the cause has been set */
  boolean hasCause();
 
  /** Clears the cause */
  ErrorBuilder<T> clearCause();

}

在package下的file里面包括如下类继承体系

抽象类Codec.java定义了压缩和解压缩,getName,equals,hashCode等

public abstract class Codec {
  /** Name of the codec; written to the file's metadata. */
  public abstract String getName();
  /** Compresses the input data */
  public abstract ByteBuffer compress(ByteBuffer uncompressedData) throws IOException;
  /** Decompress the data  */
  public abstract ByteBuffer decompress(ByteBuffer compressedData) throws IOException;
  /**
   * Codecs must implement an equals() method.  Two codecs, A and B are equal
   * if: the result of A and B decompressing content compressed by A is the same
   * AND the retult of A and B decompressing content compressed by B is the same
   **/
  @Override
  public abstract boolean equals(Object other);
  /**
   * Codecs must implement a hashCode() method that is consistent with equals().*/
  @Override
  public abstract int hashCode();
  @Override
  public String toString() {
    return getName();
  }
}

对应的子类包括:

public class BZip2Codec extends Codec 实现Implements bzip2 compression and decompression.

内部依赖org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream;
                和org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream;

class DeflateCodec extends Codec  实现Implements DEFLATE (RFC1951) compression and decompression.

final class NullCodec extends Codec 实现Implements "null" (pass through) codec.

class SnappyCodec extends Codec 实现Implements Snappy compression and decompression

内部使用CRC32 crc32 = new CRC32();

注意上述的4个子类,一个是public,另外两个为定义访问控制项,还有一个为final

以及Codec的抽象工厂

public abstract class CodecFactory

对应的createInstance是抽象工厂方法

/** Creates internal Codec. */
  protected abstract Codec createInstance();

工厂注册

 public static CodecFactory addCodec(String name, CodecFactory c) {
    return REGISTERED.put(name, c);
  }

创建工厂方法

public static CodecFactory fromString(String s) {
    CodecFactory o = REGISTERED.get(s);
    if (o == null) {
      throw new AvroRuntimeException("Unrecognized codec: " + s);
    }
    return o;
  }

已经对应的具体工厂实例

 public static CodecFactory nullCodec() {
    return NullCodec.OPTION;
  }

  /** Deflate codec, with specific compression.
   * compressionLevel should be between 1 and 9, inclusive. */
  public static CodecFactory deflateCodec(int compressionLevel) {
    return new DeflateCodec.Option(compressionLevel);
  }

  /** Snappy codec.*/
  public static CodecFactory snappyCodec() {
    return new SnappyCodec.Option();
  }

  /** bzip2 codec.*/
  public static CodecFactory bzip2Codec() {
    return new BZip2Codec.Option();
  }

两个跟file有关的接口

SeekableInput

public interface SeekableInput extends Closeable {

  /** Set the position for the next {@link java.io.InputStream#read(byte[],int,int) read()}. */
  void seek(long p) throws IOException;

  /** Return the position of the next {@link java.io.InputStream#read(byte[],int,int) read()}. */
  long tell() throws IOException;

  /** Return the length of the file. */
  long length() throws IOException;

  /** Equivalent to {@link java.io.InputStream#read(byte[],int,int)}. */
  int read(byte[] b, int off, int len) throws IOException;
}

四个方法

seek,tell,length,read

对应的子类SeekableFileInput

public class SeekableFileInput
  extends FileInputStream implements SeekableInput {

  public SeekableFileInput(File file) throws IOException { super(file); }
  public SeekableFileInput(FileDescriptor fd) throws IOException { super(fd); }

  public void seek(long p) throws IOException { getChannel().position(p); }
  public long tell() throws IOException { return getChannel().position(); }
  public long length() throws IOException { return getChannel().size(); }

}

另外一个子类SeekableByteArrayInput

public class SeekableByteArrayInput extends ByteArrayInputStream implements SeekableInput {

    public SeekableByteArrayInput(byte[] data) {
        super(data);
    }

    public long length() throws IOException {
        return this.count;
    }

    public void seek(long p) throws IOException {
        this.reset();
        this.skip(p);
    }

    public long tell() throws IOException {
        return this.pos;
    }
}

另外一个接口为FileReader,包括next,sync,pastSync,tell四个方法

public interface FileReader<D> extends Iterator<D>, Iterable<D>, Closeable {
  /** Return the schema for data in this file. */
  Schema getSchema();

   D next(D reuse) throws IOException;
  void sync(long position) throws IOException;
  boolean pastSync(long position) throws IOException;
  long tell() throws IOException;

}

对应实现子类包括:

DataFileReader

public class DataFileReader<D>
  extends DataFileStream<D> implements FileReader<D> {}

以及另外的一个版本DataFileReader12

/** Read files written by Avro version 1.2. */
public class DataFileReader12<D> implements FileReader<D>, Closeable {}

该类中有几个方法值得关注

@Override
  public synchronized D next(D reuse) throws IOException {
    while (blockCount == 0) {                     // at start of block

      if (in.tell() == in.length())               // at eof
        return null;

      skipSync();                                 // skip a sync

      blockCount = vin.readLong();                // read blockCount
        
      if (blockCount == FOOTER_BLOCK) {
        seek(vin.readLong()+in.tell());           // skip a footer
      }
    }
    blockCount--;
    return reader.read(reuse, vin);
  }

public synchronized void seek(long position) throws IOException {
    in.seek(position);
    blockCount = 0;
    blockStart = position;
    vin = DecoderFactory.get().binaryDecoder(in, vin);
  }

  /** Move to the next synchronization point after a position. */
  @Override
  public synchronized void sync(long position) throws IOException {
    if (in.tell()+SYNC_SIZE >= in.length()) {
      seek(in.length());
      return;
    }
    in.seek(position);
    vin.readFixed(syncBuffer);
    for (int i = 0; in.tell() < in.length(); i++) {
      int j = 0;
      for (; j < sync.length; j++) {
        if (sync[j] != syncBuffer[(i+j)%sync.length])
          break;
      }
      if (j == sync.length) {                     // position before sync
        seek(in.tell() - SYNC_SIZE);
        return;
      }
      syncBuffer[i%sync.length] = (byte)in.read();
    }
    seek(in.length());
  }

以及构造函数

 public DataFileReader12(SeekableInput sin, DatumReader<D> reader)
    throws IOException {
    this.in = new DataFileReader.SeekableInputStream(sin);

    byte[] magic = new byte[4];
    in.read(magic);
    if (!Arrays.equals(MAGIC, magic))
      throw new IOException("Not a data file.");

    long length = in.length();
    in.seek(length-4);
    int footerSize=(in.read()<<24)+(in.read()<<16)+(in.read()<<8)+in.read();
    seek(length-footerSize);
    long l = vin.readMapStart();
    if (l > 0) {
      do {
        for (long i = 0; i < l; i++) {
          String key = vin.readString(null).toString();
          ByteBuffer value = vin.readBytes(null);
          byte[] bb = new byte[value.remaining()];
          value.get(bb);
          meta.put(key, bb);
        }
      } while ((l = vin.mapNext()) != 0);
    }

    this.sync = getMeta(SYNC);
    this.count = getMetaLong(COUNT);
    String codec = getMetaString(CODEC);
    if (codec != null && ! codec.equals(NULL_CODEC)) {
      throw new IOException("Unknown codec: " + codec);
    }
    this.schema = Schema.parse(getMetaString(SCHEMA));
    this.reader = reader;

    reader.setSchema(schema);

    seek(MAGIC.length);         // seek to start
  }

当然还包括

DataFileStream实现Iterator

public class DataFileStream<D> implements Iterator<D>, Iterable<D>, Closeable {

内置核心方法

@Override
  public boolean hasNext() {
    try {
      if (blockRemaining == 0) {
        // check that the previous block was finished
        if (null != datumIn) {
          boolean atEnd = datumIn.isEnd();
          if (!atEnd) {
            throw new IOException("Block read partially, the data may be corrupt");
          }
        }
        if (hasNextBlock()) {
          block = nextRawBlock(block);
          block.decompressUsing(codec);
          blockBuffer = block.getAsByteBuffer();
          datumIn = DecoderFactory.get().binaryDecoder(
              blockBuffer.array(), blockBuffer.arrayOffset() +
              blockBuffer.position(), blockBuffer.remaining(), datumIn);
        }
      }
      return blockRemaining != 0;
    } catch (EOFException e) {                    // at EOF
      return false;
    } catch (IOException e) {
      throw new AvroRuntimeException(e);
    }
  }

以及一个DataFileWriter

public class DataFileWriter<D> implements Closeable, Flushable {

核心方法

/** Flush the current state of the file. */
  @Override
  public void flush() throws IOException {
    sync();
    vout.flush();
  }

  public void close() throws IOException {
    if (isOpen) {
      flush();
      out.close();
      isOpen = false;
    }
  }

以及LengthLimitedInputStream.java类

class LengthLimitedInputStream extends FilterInputStream {}

更多内容分析继续......

 

 

原创粉丝点击