HBase source code. WALEdit

来源:互联网 发布:php编辑器源代码 编辑:程序博客网 时间:2024/06/02 02:52

WALEdit 是HBase用于为transaction提供atomic时候所需要的类.


我们都知道在HBase将数据写入file system之前, 都需要将数据写入log, 叫write ahead log (WAL),  这样的好处是, 一旦region server在处理操作过程中宕机后, log可以分配给其它region servers, 然后可以根据log中记录进行replay恢复操作. 就是这么一个机制可以提供transaction的durability,


诶?不是atomic吗?上面其实只是描述WAL的用途, 容我慢慢道来

WALEdit在这个过程中的角色类似一个包装类, 在写入log之前, 需要将所有操作进行适量包装. 它将同一个transaction中的所有操作. 包装成一个WALEdit. 要不就包装成功, 一旦有一个操作包装失败, 整个包装失败, 既然包装失败就无法写进log里面, 然后transaction失败. 所以提供了atomic. HBase本身也有提供atomic操作, 但是只支持row atomic, 所以像transaction这种多个row的操作, 无法支持atomic操作, 需要另作处理.


怎么个包装法? 有两个版本, 一个是旧版本, 一个是新版本. 我们先说旧版本.

假如有一个transaction, 对1个row R的column: c1, c2, c3进行了修改操作, 那么WALEdit会将这三个操作包成如下模式:

<logseq1-for-edit1>:<KeyValue-for-edit-c1>
<logseq2-for-edit2>:<KeyValue-for-edit-c2>
<logseq3-for-edit3>:<KeyValue-for-edit-c3>

像这种就旧版本就是以row atomic的方式存储每一个修改操作, 此时一旦server挂掉, 这样log中就记录了完成的这几个操作, 类似half completed, 显然这是不符合atomic的.

所以新版本是如下, 对同样的操作, 它会将edits包装成: <logseq#-for-entire-txn>:<WALEdit-for-entire-txn>,  WALEdit会进一步分解成如下模式:

例如: <-1, 3, <Keyvalue-for-edit-c1>, <KeyValue-for-edit-c2>, <KeyValue-for-edit-c3>>

-1只是一个标志, 是为了对旧版本兼容.

可以看到三个操作都写进同一个log中, 所以如果一旦一个操作失败, 这个log就不会记录下来, 从而达到atomic的效果.


然后介绍WALEdits里面的内部成员:

因为有两个版本(一旧一新), 所以会有一个版本标志的int: VERSION_2

还有一个replay标志 boolean, 表示: 这个WALEdit是正常写入产生的, 还是因为server挂掉, 需要replay操作而产生的.

当然从上面格式也可以看出, 一个WALEdit生成一条log(新版), 然后一条log里面有多个keyValue

所以它还有一个存放KeyValue的list, 不过这里是List<Cell>, KeyValue继承于Cell, 前面有讲过. 原因大家都懂, 就是多态.

还有一个内部成员叫compressionContext, 不确定用法, 但看了下HBase的configuration中可以进行配置, 选择压缩模式.

应该就是为了进一步压缩log的size而存在的压缩方式选择.

类似上面c1, c2, c3的因为除了column以后的不同外,row, region, table等相关信息是相同, 确实符合压缩的情景. 只能这样理解.


提供了那些方法:

针对第一个成员replay, 显然有提供isReplay, 判断是否replay产生的WALEdit的方法, 还有构造方法也有提供直接, 根据boolean isReplay来构造的WALEdit.

针对第二个成员Cell list, 有简单的获取get方法, 当然还有添加的add方法, 还有返回size, cell的个数的方法, 判断list是否为空

针对compressionContext, 类似可以选择压缩模式, 肯定会有一个让我们设置的set方法.

还有toString, 我们要以上面所述格式写进log, 所以应该会有重写toString方法.

以上是我们根据内部成员可以直观推测或常识判断的应该有的一些方法.

还有比较难想象的呢?

.class显示,

还提供了WALEdit和I/O流相互转换的方法. 个人理解的使用场景应该是, 当log从挂掉的server转移到其它servers上去进行replay操作时的数据转换. 因为cluster之间的传输方式是google的protobuf的rpc传输,所以log在传递到其它servers上时, 应该就是一个protobuf的DataInput的Input流这么一个形式, 所以读取I/O并转换成对应的WALEdit就显得十分合理了. 当然有读必然有写, 有接受必有传送.

还有几个不显眼的方法(无法判断是否不常用):

一个是从cells' decoder中读取, 并生成KeyValue, decoder应该就是某些数据被解码后的形式, 可能是读进来的数据进行过其它编码?(看code显示, 这里的decoder是另外一个class和compressionContext, 还是有差, 不一样的)

还有就是我们都知道KeyValue存在很多地方, 例如HBase的meta也有, 所以当log的转移发生时, 对应的meta的信息也需要随之更新变化,  所以还有一些跟meta相关的方法.


最后还有几个静态方法:

从名字上来推测, 应该是log需要被flush时, 或者log需要被compaction时, 还有region打开时的一些准备方法.


因为class不长, 可以把code贴上来. 有兴趣可以阅读理解一下.


from Reid Chan 


/** * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements.  See the NOTICE file * distributed with this work for additional information * regarding copyright ownership.  The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License.  You may obtain a copy of the License at * *     http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package org.apache.hadoop.hbase.regionserver.wal;import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;import java.util.ArrayList;import java.util.NavigableMap;import java.util.TreeMap;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;import org.apache.hadoop.hbase.classification.InterfaceAudience;import org.apache.hadoop.hbase.Cell;import org.apache.hadoop.hbase.CellUtil;import org.apache.hadoop.hbase.HRegionInfo;import org.apache.hadoop.hbase.HBaseInterfaceAudience;import org.apache.hadoop.hbase.KeyValue;import org.apache.hadoop.hbase.KeyValueUtil;import org.apache.hadoop.hbase.codec.Codec;import org.apache.hadoop.hbase.io.HeapSize;import org.apache.hadoop.hbase.protobuf.generated.WALProtos.CompactionDescriptor;import org.apache.hadoop.hbase.protobuf.generated.WALProtos.FlushDescriptor;import org.apache.hadoop.hbase.protobuf.generated.WALProtos.RegionEventDescriptor;import org.apache.hadoop.hbase.util.Bytes;import org.apache.hadoop.hbase.util.ClassSize;import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;import org.apache.hadoop.io.Writable;/** * WALEdit: Used in HBase's transaction log (WAL) to represent * the collection of edits (KeyValue objects) corresponding to a * single transaction. The class implements "Writable" interface * for serializing/deserializing a set of KeyValue items. * * Previously, if a transaction contains 3 edits to c1, c2, c3 for a row R, * the WAL would have three log entries as follows: * *    <logseq1-for-edit1>:<KeyValue-for-edit-c1> *    <logseq2-for-edit2>:<KeyValue-for-edit-c2> *    <logseq3-for-edit3>:<KeyValue-for-edit-c3> * * This presents problems because row level atomicity of transactions * was not guaranteed. If we crash after few of the above appends make * it, then recovery will restore a partial transaction. * * In the new world, all the edits for a given transaction are written * out as a single record, for example: * *   <logseq#-for-entire-txn>:<WALEdit-for-entire-txn> * * where, the WALEdit is serialized as: *   <-1, # of edits, <KeyValue>, <KeyValue>, ... > * For example: *   <-1, 3, <Keyvalue-for-edit-c1>, <KeyValue-for-edit-c2>, <KeyValue-for-edit-c3>> * * The -1 marker is just a special way of being backward compatible with * an old WAL which would have contained a single <KeyValue>. * * The deserializer for WALEdit backward compatibly detects if the record * is an old style KeyValue or the new style WALEdit. * */@InterfaceAudience.LimitedPrivate({ HBaseInterfaceAudience.REPLICATION,    HBaseInterfaceAudience.COPROC })public class WALEdit implements Writable, HeapSize {  public static final Log LOG = LogFactory.getLog(WALEdit.class);  // TODO: Get rid of this; see HBASE-8457  public static final byte [] METAFAMILY = Bytes.toBytes("METAFAMILY");  static final byte [] METAROW = Bytes.toBytes("METAROW");  static final byte[] COMPACTION = Bytes.toBytes("HBASE::COMPACTION");  static final byte [] FLUSH = Bytes.toBytes("HBASE::FLUSH");  static final byte [] REGION_EVENT = Bytes.toBytes("HBASE::REGION_EVENT");  private final int VERSION_2 = -1;  private final boolean isReplay;  private final ArrayList<Cell> cells = new ArrayList<Cell>(1);  public static final WALEdit EMPTY_WALEDIT = new WALEdit();  // Only here for legacy writable deserialization  @Deprecated  private NavigableMap<byte[], Integer> scopes;  private CompressionContext compressionContext;  public WALEdit() {    this(false);  }  public WALEdit(boolean isReplay) {    this.isReplay = isReplay;  }  /**   * @param f   * @return True is <code>f</code> is {@link #METAFAMILY}   */  public static boolean isMetaEditFamily(final byte [] f) {    return Bytes.equals(METAFAMILY, f);  }  public static boolean isMetaEditFamily(Cell cell) {    return CellUtil.matchingFamily(cell, METAFAMILY);  }  /**   * @return True when current WALEdit is created by log replay. Replication skips WALEdits from   *         replay.   */  public boolean isReplay() {    return this.isReplay;  }  public void setCompressionContext(final CompressionContext compressionContext) {    this.compressionContext = compressionContext;  }  public WALEdit add(Cell cell) {    this.cells.add(cell);    return this;  }  public boolean isEmpty() {    return cells.isEmpty();  }  public int size() {    return cells.size();  }  public ArrayList<Cell> getCells() {    return cells;  }  public NavigableMap<byte[], Integer> getAndRemoveScopes() {    NavigableMap<byte[], Integer> result = scopes;    scopes = null;    return result;  }  @Override  public void readFields(DataInput in) throws IOException {    cells.clear();    if (scopes != null) {      scopes.clear();    }    int versionOrLength = in.readInt();    // TODO: Change version when we protobuf.  Also, change way we serialize KV!  Pb it too.    if (versionOrLength == VERSION_2) {      // this is new style WAL entry containing multiple KeyValues.      int numEdits = in.readInt();      for (int idx = 0; idx < numEdits; idx++) {        if (compressionContext != null) {          this.add(KeyValueCompression.readKV(in, compressionContext));        } else {          this.add(KeyValue.create(in));        }      }      int numFamilies = in.readInt();      if (numFamilies > 0) {        if (scopes == null) {          scopes = new TreeMap<byte[], Integer>(Bytes.BYTES_COMPARATOR);        }        for (int i = 0; i < numFamilies; i++) {          byte[] fam = Bytes.readByteArray(in);          int scope = in.readInt();          scopes.put(fam, scope);        }      }    } else {      // this is an old style WAL entry. The int that we just      // read is actually the length of a single KeyValue      this.add(KeyValue.create(versionOrLength, in));    }  }  @Override  public void write(DataOutput out) throws IOException {    LOG.warn("WALEdit is being serialized to writable - only expected in test code");    out.writeInt(VERSION_2);    out.writeInt(cells.size());    // We interleave the two lists for code simplicity    for (Cell cell : cells) {      // This is not used in any of the core code flows so it is just fine to convert to KV      KeyValue kv = KeyValueUtil.ensureKeyValue(cell);      if (compressionContext != null) {        KeyValueCompression.writeKV(out, kv, compressionContext);      } else{        KeyValue.write(kv, out);      }    }    if (scopes == null) {      out.writeInt(0);    } else {      out.writeInt(scopes.size());      for (byte[] key : scopes.keySet()) {        Bytes.writeByteArray(out, key);        out.writeInt(scopes.get(key));      }    }  }  /**   * Reads WALEdit from cells.   * @param cellDecoder Cell decoder.   * @param expectedCount Expected cell count.   * @return Number of KVs read.   */  public int readFromCells(Codec.Decoder cellDecoder, int expectedCount) throws IOException {    cells.clear();    cells.ensureCapacity(expectedCount);    while (cells.size() < expectedCount && cellDecoder.advance()) {      cells.add(cellDecoder.current());    }    return cells.size();  }  @Override  public long heapSize() {    long ret = ClassSize.ARRAYLIST;    for (Cell cell : cells) {      ret += CellUtil.estimatedHeapSizeOf(cell);    }    if (scopes != null) {      ret += ClassSize.TREEMAP;      ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);      // TODO this isn't quite right, need help here    }    return ret;  }  @Override  public String toString() {    StringBuilder sb = new StringBuilder();    sb.append("[#edits: " + cells.size() + " = <");    for (Cell cell : cells) {      sb.append(cell);      sb.append("; ");    }    if (scopes != null) {      sb.append(" scopes: " + scopes.toString());    }    sb.append(">]");    return sb.toString();  }  public static WALEdit createFlushWALEdit(HRegionInfo hri, FlushDescriptor f) {    KeyValue kv = new KeyValue(getRowForRegion(hri), METAFAMILY, FLUSH,      EnvironmentEdgeManager.currentTime(), f.toByteArray());    return new WALEdit().add(kv);  }  public static FlushDescriptor getFlushDescriptor(Cell cell) throws IOException {    if (CellUtil.matchingColumn(cell, METAFAMILY, FLUSH)) {      return FlushDescriptor.parseFrom(cell.getValue());    }    return null;  }  public static WALEdit createRegionEventWALEdit(HRegionInfo hri,      RegionEventDescriptor regionEventDesc) {    KeyValue kv = new KeyValue(getRowForRegion(hri), METAFAMILY, REGION_EVENT,      EnvironmentEdgeManager.currentTime(), regionEventDesc.toByteArray());    return new WALEdit().add(kv);  }  public static RegionEventDescriptor getRegionEventDescriptor(Cell cell) throws IOException {    if (CellUtil.matchingColumn(cell, METAFAMILY, REGION_EVENT)) {      return RegionEventDescriptor.parseFrom(cell.getValue());    }    return null;  }  /**   * Create a compacion WALEdit   * @param c   * @return A WALEdit that has <code>c</code> serialized as its value   */  public static WALEdit createCompaction(final HRegionInfo hri, final CompactionDescriptor c) {    byte [] pbbytes = c.toByteArray();    KeyValue kv = new KeyValue(getRowForRegion(hri), METAFAMILY, COMPACTION,      EnvironmentEdgeManager.currentTime(), pbbytes);    return new WALEdit().add(kv); //replication scope null so that this won't be replicated  }  private static byte[] getRowForRegion(HRegionInfo hri) {    byte[] startKey = hri.getStartKey();    if (startKey.length == 0) {      // empty row key is not allowed in mutations because it is both the start key and the end key      // we return the smallest byte[] that is bigger (in lex comparison) than byte[0].      return new byte[] {0};    }    return startKey;  }  /**   * Deserialized and returns a CompactionDescriptor is the KeyValue contains one.   * @param kv the key value   * @return deserialized CompactionDescriptor or null.   */  public static CompactionDescriptor getCompaction(Cell kv) throws IOException {    if (CellUtil.matchingColumn(kv, METAFAMILY, COMPACTION)) {      return CompactionDescriptor.parseFrom(kv.getValue());    }    return null;  }}



0 0
原创粉丝点击