levelDB源码笔记（3）-cache

来源：互联网发布：如何看淘宝总消费金额编辑：程序博客网时间：2024/06/05 13:20

levelDB实现的cache是LRU(Least Recently Used 近期最少使用)算法。其实现在ShardedLRUCache中，类成员主要有

class ShardedLRUCache : public Cache { private:  LRUCache shard_[kNumShards];  port::Mutex id_mutex_;  uint64_t last_id_;  static inline uint32_t HashSlice(const Slice& s) {    return Hash(s.data(), s.size(), 0);  }  static uint32_t Shard(uint32_t hash) {    return hash >> (32 - kNumShardBits);  } public:  explicit ShardedLRUCache(size_t capacity)      : last_id_(0) {    const size_t per_shard = (capacity + (kNumShards - 1)) / kNumShards;    for (int s = 0; s < kNumShards; s++) {      shard_[s].SetCapacity(per_shard);    }  }  virtual ~ShardedLRUCache() { }  virtual Handle* Insert(const Slice& key, void* value, size_t charge,                         void (*deleter)(const Slice& key, void* value)) {    const uint32_t hash = HashSlice(key);    return shard_[Shard(hash)].Insert(key, hash, value, charge, deleter);  }  virtual Handle* Lookup(const Slice& key) {    const uint32_t hash = HashSlice(key);    return shard_[Shard(hash)].Lookup(key, hash);  }  virtual void Release(Handle* handle);  virtual void Erase(const Slice& key);  virtual void* Value(Handle* handle);};

其主要的成员是LRUCache shard_[kNumShards];

每个SharedLRUCache包含多个LRUCache，查找Key时首先计算key属于哪一个分片hash=Shard(HashSlice(key)) ,然后在相应的shard_[hash]上进行查找。分片采用hash值的高位，这是一种常见的方法。使用多个LRUCache上，可以减少多线程的锁开销。对了，cache里都使用了mutex，ref等技术，保证了线程安全

LRUCache用的是个比较标准的算法。

 class LRUCache { public:  void SetCapacity(size_t capacity) { capacity_ = capacity; }  // Like Cache methods, but with an extra "hash" parameter.  Cache::Handle* Insert(const Slice& key, uint32_t hash,                        void* value, size_t charge,                        void (*deleter)(const Slice& key, void* value));  Cache::Handle* Lookup(const Slice& key, uint32_t hash);  void Release(Cache::Handle* handle);  void Erase(const Slice& key, uint32_t hash); private: // Initialized before use.  size_t capacity_;  // mutex_ protects the following state.  port::Mutex mutex_;  size_t usage_;  // Dummy head of LRU list.  // lru.prev is newest entry, lru.next is oldest entry.  LRUHandle lru_;  HandleTable table_;}

LRUCache需要通过key来获取对应的value（或null表示missing），每个数据对的指针保存在一个Handle节点里。

其主要成员包括一个按使用时间排列的双向链表lru。双向列表适合插入删除，特别是可以快速删除最老的块。其中lru.prev指向最新使用的块，lru.next指向最老的块。capacity_是cache最大长度，useage_则是当前已用，当useage_>capicity_，则删掉最老的一些块，释放内存。

另一个成员table_。这是一个自己实现的哈希表（内部实现采用二维链表），可以通过key找到对应的块。

双向链表和handletable的单个节点都是LRUHandle。所以LRUHandle包含了数据key/value，双向列表需要的前向，后向指针，以及handletable需要的指针next_hash

其成员函数定义如下：

struct LRUHandle {  void* value;  void (*deleter)(const Slice&, void* value);  LRUHandle* next_hash; //handletable需要的指针  LRUHandle* next;  //双向链表需要的next指针  LRUHandle* prev;   //双向链表需要的prev指针  size_t charge;      // 本块的数据大小（似乎是只有value的大小？不过这个不重要）  size_t key_length;  uint32_t refs;  uint32_t hash;      // Hash of key(); used for fast sharding and comparisons  char key_data[1];   // Beginning of key......};

这里用了几个技巧

1. key_data[1] .这个必须放在结构的最后一个成员。当申请一个LRUHandle变量时，申请的长度是sizeof(LRUHandle)-1 + key.size()

  LRUHandle* e = reinterpret_cast<LRUHandle*>(      malloc(sizeof(LRUHandle)-1 + key.size()));

用key_data可以索引到后面一些多出来的内存，等于实现了一个变长的buffer。

2. refs。

我们知道，cache随时可能被替换。当一个Handle块被从cache删除时，可能外部另一个线程正在使用它，这时候我们直接释放掉对应的内存，就会出错。

这里用ref配合delete函数来解决这个问题。当insert一个块时，ref=2。因为此时外部正在访问这个块，同时cache内部保留了这个块。另一个外部使用函数是lookup。每次外部从cache中获取一个handle，其ref++。当外部使用完毕后，必须调用unref函数，ref--。从cache中删除一个块，同样造成ref--。只有当ref<=0,才真正调用delete释放内存，同时修改对应cache的大小（useage）

3.将双向列表和hashtable使用同一个handle结构作为基本节点。

Handletable的主要成员变量是

 private:  // The table consists of an array of buckets where each bucket is  // a linked list of cache entries that hash into the bucket.  uint32_t length_;  uint32_t elems_;  LRUHandle** list_;

查找方式如下：

  LRUHandle** FindPointer(const Slice& key, uint32_t hash) {    LRUHandle** ptr = &list_[hash & (length_ - 1)];    while (*ptr != NULL &&           ((*ptr)->hash != hash || key != (*ptr)->key())) {      ptr = &(*ptr)->next_hash;    }    return ptr;  }

其实相当于采用链表法避免冲突的哈希表。

0 0