HBase源码系列（五）Get、Scan在服务端是如何处理的？

来源：互联网发布：阿里java面试题2016 编辑：程序博客网时间：2024/05/29 04:06

Get
Scan

get 和 scan实际上在服务端的处理，有许多相似的地方，有趣的是get其实也是scan。

Get

直接杀到RSRpcServices的get方法

r = region.get(clientGet);

在HRegion主要做了几件事情：
1、检查get 的rowkey是否在HRegion范围内。
2、如果get请求带了family，就检查这些family在table中是否存在；如果没有携带family，则把table的所有family都加到 get。
3、 scan

Scan scan;if (get.isClosestRowBefore()) {  scan = buildScanForGetWithClosestRowBefore(get);} else {  scan = new Scan(get);}RegionScanner scanner = null;try {  scanner = getScanner(scan);  scanner.next(results);} finally {  if (scanner != null)    scanner.close();}

实例化RegionScanner：如果是反向 scan 就实例化ReversedRegionScannerImpl，否则实例化RegionScannerImpl

if (scan.isReversed()) {  if (scan.getFilter() != null) {    scan.getFilter().setReversed(true);  }  return new ReversedRegionScannerImpl(scan, additionalScanners, this);}return new RegionScannerImpl(scan, additionalScanners, this);

遍历scan的所有的family,从family对应的Store中getScanner

for (Map.Entry<byte[], NavigableSet<byte[]>> entry : scan.getFamilyMap().entrySet()) {  Store store = stores.get(entry.getKey());  KeyValueScanner scanner = store.getScanner(scan, entry.getValue(), this.readPt);  instantiatedScanners.add(scanner);  if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand()      || this.filter.isFamilyEssential(entry.getKey())) {    scanners.add(scanner);  } else {    joinedScanners.add(scanner);  }}

实例化的代码就不贴了，这里要加锁了，如果是反向scan就实例化ReversedStoreScanner，否则实例化StoreScanner。
终于到了StoreScanner核心代码，StoreScanner的构造器中做了几个重要的操作：
1、使用Bloom filter, time range, and TTL过滤掉不需要的StoreFiles。
2、根据key找到特定的scanners。
3、把找到scanners合并到一个KeyValueHeap中，实际上就是把KeyValueScanner放到优先级队列。

开始取出KeyValueHeap的内容,把currentRow的纪录放到Cell列表里面。

do {  // We want to maintain any progress that is made towards the limits while scanning across  // different column families. To do this, we toggle the keep progress flag on during calls  // to the StoreScanner to ensure that any progress made thus far is not wiped away.  scannerContext.setKeepProgress(true);  heap.next(results, scannerContext);  scannerContext.setKeepProgress(tmpKeepProgress);  nextKv = heap.peek();  moreCellsInRow = moreCellsInRow(nextKv, currentRow, offset, length);  if (moreCellsInRow && scannerContext.checkBatchLimit(limitScope)) {    return scannerContext.setScannerState(NextState.BATCH_LIMIT_REACHED).hasMoreValues();  } else if (scannerContext.checkSizeLimit(limitScope)) {    ScannerContext.NextState state =        moreCellsInRow ? NextState.SIZE_LIMIT_REACHED_MID_ROW : NextState.SIZE_LIMIT_REACHED;    return scannerContext.setScannerState(state).hasMoreValues();  } else if (scannerContext.checkTimeLimit(limitScope)) {    ScannerContext.NextState state =        moreCellsInRow ? NextState.TIME_LIMIT_REACHED_MID_ROW : NextState.TIME_LIMIT_REACHED;    return scannerContext.setScannerState(state).hasMoreValues();  }} while (moreCellsInRow);

如果还有结果没有取完就设置ScannerState状态为MORE_VALUES，继续从队列中取出结果；如果结果已经全部取出，设置ScannerState状态为NO_MORE_VALUES

// We are done. Return the result.if (stopRow) {  return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();} else {  return scannerContext.setScannerState(NextState.MORE_VALUES).hasMoreValues();}

最后，可能需要纪录下一些统计信息

// do after lockif (this.metricsRegion != null) {  long totalSize = 0L;  for (Cell cell : results) {    totalSize += CellUtil.estimatedSerializedSizeOf(cell);  }  this.metricsRegion.updateGet(totalSize);}

Scan

RSRpcServices的scan方法，直接就是一大坨代码。
如果scan请求设了ScannerId，就从regionServer直接拿到HRegion。如果没有设置ScannerId，就基于region specifier找到HRegion。
看到了和get相同的部分：

scanner = region.getScanner(scan);

实例化 RegionScannerImpl 类。
在他的构造中，做了几个重要的操作，1、遍历store，并get 到他的KeyValueScanner；2、初始化KVHeap。

scan的核心代码和get的是一样的： scanner.nextRaw(values, scannerContext)。

// 遍历所有的行while (i < rows) {  // 每一行中的所有cell，放到List<Cell>中  moreRows = scanner.nextRaw(values, scannerContext);  if (!values.isEmpty()) {    for (Cell cell : values) {      totalCellSize += CellUtil.estimatedSerializedSizeOf(cell);    }    final boolean partial = scannerContext.partialResultFormed();    // 把List<Cell>封装成Result，并加入到List<Reslut>中。    results.add(Result.create(values, null, stale, partial));    i++;  }  ...}

将结果包装到ScanResponse

addResults(builder, results, controller,                               RegionReplicaUtil.isDefaultReplica(region.getRegionInfo()));

阅读全文

0 0