Concurrent LRU Block Cache in HBase

来源：互联网发布：太祖与太宗知乎编辑：程序博客网时间：2024/06/06 20:20

Jonathan Gray 用HBASE-1460来说明自己是如何实现LRU Block Cache的：

The LRU-based block cache that will be committed in ~~HBASE-1192~~ is thread-safe but contains a big lock on the hash map. Under high load, the block cache will be hit very heavily from a number of threads, so it needs to be built to handle massive concurrency.

This issue aims to implement a new block cache with LRU eviction, but backed by a ConcurrentHashMap and a separate eviction thread. Influence will be drawn from Solr's ConcurrentLRUCache, however there are major differences because solr treats all cached elements as equal size whereas we are dependent on our HeapSize interface with realistic (though approximate) heap usage.

有兴趣的读者可以查看https://issues.apache.org/jira/browse/HBASE-1460

A block cache implementation that is memory-aware using {@link HeapSize}, memory-bound using an LRU

eviction algorithm, and concurrent: backed by a {@link ConcurrentHashMap} and with a non-blocking eviction

thread giving constant-time {@link #cacheBlock} and {@link #getBlock} operations.

Contains three levels of block priority to allow for scan-resistance and in-memory families.

A block is added with an inMemory flag if necessary, otherwise a block becomes a single access priority.

Once a blocked is accessed again, it changes to multiple access. This is used to prevent scans from thrashing

the cache, adding a least-frequently-used element to the eviction algorithm.

Each priority is given its own chunk of the total cache to ensure fairness during eviction. Each priority will retain

close to its maximum size, however, if any priority is not using its entire chunk the others are able to grow beyond their chunk size.

Instantiated at a minimum with the total size and average block size. All sizes are in bytes. The block size is not especially important as this
cache is fully dynamic in its sizing of blocks. It is only used for pre-allocating data structures and in initial heap estimation of the map.

The detailed constructor defines the sizes for the three priorities (they should total to the maximum size defined). It also sets the levels that
trigger and control the eviction thread.

The acceptable size is the cache size level which triggers the eviction process to start. It evicts enough blocks to get the size below the
minimum size specified.

Eviction happens in a separate thread and involves a single full-scan of the map. It determines how many bytes must be freed to reach the minimum
size, and then while scanning determines the fewest least-recently-used blocks necessary from each of the three priorities (would be 3 times bytes
to free). It then uses the priority chunk sizes to evict fairly according to the relative sizes and usage.

想更好理解它当然是读org.apache.hadoop.hbase.io.hfile下面的源代码（发现自己在一个实时计算的项目中针对total使用的二级cache用的就是LFU + LRU组合，cache1

中存放的是访问一次的key，当再次被hit到的时候则将其move到cache2中，cache1使用LRU算法淘汰老的key，而cache2通过阈值决定是否批量puts到hbase中）。

关键代码（英文代码注释已经很详细了，不重复解释）：

Part 1:

  /** Concurrent map (the cache) */  private final ConcurrentHashMap<String,CachedBlock> map;

Part 2:

  /** Single access bucket size */  private float singleFactor;  /** Multiple access bucket size */  private float multiFactor;  /** In-memory bucket size */  private float memoryFactor;

Part 3:

  /**   * Cache the block with the specified name and buffer.   * <p>   * It is assumed this will NEVER be called on an already cached block.  If   * that is done, an exception will be thrown.   * @param blockName block name   * @param buf block buffer   * @param inMemory if block is in-memory   */  public void cacheBlock(String blockName, Cacheable buf, boolean inMemory) {    CachedBlock cb = map.get(blockName);    if(cb != null) {      throw new RuntimeException("Cached an already cached block");    }    cb = new CachedBlock(blockName, buf, count.incrementAndGet(), inMemory);    long newSize = updateSizeMetrics(cb, false);    map.put(blockName, cb);    elements.incrementAndGet();    if(newSize > acceptableSize() && !evictionInProgress) {      runEviction();    }  }

Part 4:

  /**   * Get the buffer of the block with the specified name.   * @param blockName block name   * @param caching true if the caller caches blocks on cache misses   * @return buffer of specified block name, or null if not in cache   */  @Override  public Cacheable getBlock(String blockName, boolean caching) {    CachedBlock cb = map.get(blockName);    if(cb == null) {      stats.miss(caching);      return null;    }    stats.hit(caching);    cb.access(count.incrementAndGet());    return cb.getBuffer();  }

Part 5:

  /**   * Eviction method.   */  void evict() {    // Ensure only one eviction at a time    if(!evictionLock.tryLock()) return;    try {      evictionInProgress = true;      long currentSize = this.size.get();      long bytesToFree = currentSize - minSize();      if (LOG.isDebugEnabled()) {        LOG.debug("Block cache LRU eviction started; Attempting to free " +          StringUtils.byteDesc(bytesToFree) + " of total=" +          StringUtils.byteDesc(currentSize));      }      if(bytesToFree <= 0) return;      // Instantiate priority buckets      BlockBucket bucketSingle = new BlockBucket(bytesToFree, blockSize,          singleSize());      BlockBucket bucketMulti = new BlockBucket(bytesToFree, blockSize,          multiSize());      BlockBucket bucketMemory = new BlockBucket(bytesToFree, blockSize,          memorySize());      // Scan entire map putting into appropriate buckets      for(CachedBlock cachedBlock : map.values()) {        switch(cachedBlock.getPriority()) {          case SINGLE: {            bucketSingle.add(cachedBlock);            break;          }          case MULTI: {            bucketMulti.add(cachedBlock);            break;          }          case MEMORY: {            bucketMemory.add(cachedBlock);            break;          }        }      }      PriorityQueue<BlockBucket> bucketQueue =        new PriorityQueue<BlockBucket>(3);      bucketQueue.add(bucketSingle);      bucketQueue.add(bucketMulti);      bucketQueue.add(bucketMemory);      int remainingBuckets = 3;      long bytesFreed = 0;      BlockBucket bucket;      while((bucket = bucketQueue.poll()) != null) {        long overflow = bucket.overflow();        if(overflow > 0) {          long bucketBytesToFree = Math.min(overflow,            (bytesToFree - bytesFreed) / remainingBuckets);          bytesFreed += bucket.free(bucketBytesToFree);        }        remainingBuckets--;      }      if (LOG.isDebugEnabled()) {        long single = bucketSingle.totalSize();        long multi = bucketMulti.totalSize();        long memory = bucketMemory.totalSize();        LOG.debug("Block cache LRU eviction completed; " +          "freed=" + StringUtils.byteDesc(bytesFreed) + ", " +          "total=" + StringUtils.byteDesc(this.size.get()) + ", " +          "single=" + StringUtils.byteDesc(single) + ", " +          "multi=" + StringUtils.byteDesc(multi) + ", " +          "memory=" + StringUtils.byteDesc(memory));      }    } finally {      stats.evict();      evictionInProgress = false;      evictionLock.unlock();    }  }

Part 6:

  /**   * Used to group blocks into priority buckets.  There will be a BlockBucket   * for each priority (single, multi, memory).  Once bucketed, the eviction   * algorithm takes the appropriate number of elements out of each according   * to configuration parameters and their relatives sizes.   */  private class BlockBucket implements Comparable<BlockBucket> {    private CachedBlockQueue queue;    private long totalSize = 0;    private long bucketSize;    public BlockBucket(long bytesToFree, long blockSize, long bucketSize) {      this.bucketSize = bucketSize;      queue = new CachedBlockQueue(bytesToFree, blockSize);      totalSize = 0;    }    public void add(CachedBlock block) {      totalSize += block.heapSize();      queue.add(block);    }    public long free(long toFree) {      CachedBlock cb;      long freedBytes = 0;      while ((cb = queue.pollLast()) != null) {        freedBytes += evictBlock(cb);        if (freedBytes >= toFree) {          return freedBytes;        }      }      return freedBytes;    }    public long overflow() {      return totalSize - bucketSize;    }    public long totalSize() {      return totalSize;    }    public int compareTo(BlockBucket that) {      if(this.overflow() == that.overflow()) return 0;      return this.overflow() > that.overflow() ? 1 : -1;    }  }

Part 7:

  protected long evictBlock(CachedBlock block) {    map.remove(block.getName());    updateSizeMetrics(block, true);    elements.decrementAndGet();    stats.evicted();    return block.heapSize();  }

文件整体功能：

HeapSize.java

一个接口，提供一个唯一的方法heapSize，用来返回实现HeapSize接口的对象所占用的空间大小。

CachedBlock.java

LruBlockCache中存储的对象就是它，它实现了接口HeapSize和Comparable，这里要说明的是它为什么要实现Comparable接口，因为它要放入PriorityQueue（CachedBlockQueue会对其进行封装）中去，而PriorityQueue可以利用CachedBlock.java提供的比较函数compareTo来决定CachedBlock存放的顺序。

CachedBlockQueue.java

对PriorityQueue进行了封装并加入了添加元素的限制条件，用来存放CachedBlock。

CachedBlockQueue会被LruBlockCache使用，这个时候会根据CachedBlock的accessTime来决定将哪些CachedBlock会被放入CachedBlockQueue中，也就是将哪些访问次数较少的放入PriorityQueue。

BlockCache.java

就是一个接口, LruBlockCache会继承这个接口并实现如何放一个CachedBlock到cache中，如何从cache取一个CachedBlock等等。

LruBlockCache.java

主要用了两个数据结构来实现lru的，一个是ConcurrentHashMap, 另外一个是CachedBlockQueue，当要cache一个CachedBlock的时候，其实就是将其丢到ConcurrentHashMap当中，其允许并行访问，当满足某个条件的时候会触发eviction，可以顺序去做，也可以直接让一个线程来做，不管是哪种方式最终都调用同一个函数evict()，它就是将map中访问次数少的CachedBlock释放掉。其实map中的CachedBlock有三种模式的，in memory, access once, access multi，所以eviction的时候会创建一个PriorityQueue，包含三个桶，每个桶其实封装的就是CachedBlockQueue，接下来就是从map取数据丢到相应的桶中去，然后释放空间直到等于请求释放空间的数量就OK了。

另外一个开源的project concurrentlinkedhashmap不错，可以琢磨一下。

http://code.google.com/p/concurrentlinkedhashmap/