HBase-split

来源：互联网发布：淘宝店铺名怎么改呢编辑：程序博客网时间：2024/06/08 19:00

HBase-split代码分析

触发split的情况

HBaseAdmin : HBaseAdmin.split
compact : CompactSplitThread.CompactRunner
memstore flush : FlushHandler.java

这里写图片描述

compact 触发 split：
CompactionRunner.run()中：

    public void run() {      //....一些条件判断      if (this.compaction == null) {        .....      // Finally we can compact something.      assert this.compaction != null;      ...      try {        ...        boolean completed = region.compact(compaction, store);        ...        if (completed) {          // degenerate case: blocked regions require recursive enqueues          if (store.getCompactPriority() <= 0) {            requestSystemCompaction(region, store, "Recursive enqueue");          } else {            // see if the compaction has caused us to exceed max region size            //*********如果超过最大的region大小******            requestSplit(region);          }        }      } catch (IOException ex) {       ...        }        server.checkFileSystem();      } catch (Exception ex) {        ...      } finally {        LOG...      }      this.compaction.getRequest().afterExecute();//一个空的方法    }

上面的：store.getCompactPriority() <= 0 是什么意思？？

我们来看一下HStore.java中 getCompactPriority()

@Override  public int getCompactPriority() {    // 从StoreFileManager中获取Compact Priority    int priority = this.storeEngine.getStoreFileManager().getStoreCompactionPriority();    if (priority == PRIORITY_USER) {      LOG.warn("Compaction priority is USER despite there being no user compaction");    }    return priority;  }

它转而从StoreFileManager中获取Compact Priority，继续吧！在StoreFileManager的默认实现DefaultStoreFileManager中，代码如下：

@Override  public int getStoreCompactionPriority() {    isTooManyStoreFiles:MemStore在进行flush时会判断HRegion上每个HStore下的文件数是否太多，太多则意味着MemStore的flush会被推迟进行，优先进行compact，否则文件数则会越来越多，而这里，离blockingFileCount越远，当前文件数越小的话，则意味着MemStore的flush可以优先进行，而compact可以在它flush之后再进行，将资源利用效率最大化    // BLOCKING_STOREFILES_KEY = "hbase.hstore.blockingStoreFiles"    // HStore.DEFAULT_BLOCKING_STOREFILE_COUNT = 7  为什么为7？？？    int blockingFileCount = conf.getInt(        HStore.BLOCKING_STOREFILES_KEY, HStore.DEFAULT_BLOCKING_STOREFILE_COUNT);    // 优先级为上述blockingFileCount减去当前storefiles的数目    int priority = blockingFileCount - storefiles.size();    // 如果priority为1，则返回2，否则返回原值    return (priority == HStore.PRIORITY_USER) ? priority + 1 : priority;  }

回到 store.getCompactPriority() <= 0 这个问题

如果 store.getCompactPriority() <= 0 则 blockingFileCount（=7） - storefiles.size() <= 0
说明 storefiles.size() >=7
然后执行 requestSystemCompaction(region, store, “Recursive enqueue”);
这个里面又是需要执行CompactionRunner

requestSplit

好了，相当于 storefiles.size() < 7 的话
CompactionRunner.run()中执行requestSplit()
这个方法是CompactSplitThread中的requestSplit()

  public synchronized boolean requestSplit(final HRegion r) {    // 1.shouldSplitRegion()  判断当前RS上region数量是否大于系统设置    // 2.r.getCompactPriority() >= 1    if (shouldSplitRegion() && r.getCompactPriority() >= Store.PRIORITY_USER) {      byte[] midKey = r.checkSplit();      if (midKey != null) {        requestSplit(r, midKey);        return true;      }    }    return false;  }

看一下shouldSplitRegion()方法里面做了什么判断？

  private boolean shouldSplitRegion() { //this.regionSplitLimit=conf.getInt(//REGION_SERVER_REGION_SPLIT_LIMIT,//DEFAULT_REGION_SERVER_REGION_SPLIT_LIMIT);   默认为1000    if(server.getNumberOfOnlineRegions() > 0.9*regionSplitLimit) {    //如果当前regionserver上的region数 > 900 打印WARN LOG      LOG.warn("Total number of regions is approaching the upper limit " + regionSplitLimit + ". "          + "Please consider taking a look at http://hbase.apache.org/book.html#ops.regionmgt");    }    // regionSplitLimit 大于 当前RS的online region数则返回true    return (regionSplitLimit > server.getNumberOfOnlineRegions());  }

region在RS上的数量和compact优先级都判断完了
下面执行HRegion checkSplit()

  /**   * Return the splitpoint. null indicates the region isn't splittable   * If the splitpoint isn't explicitly specified, it will go over the stores   * to find the best splitpoint. Currently the criteria of best splitpoint   * is based on the size of the store.   * 返回split point。null 表示不能被split。   * 如果split point 没有指定。则会根据stores寻找最佳split point 。最佳split point基于store的size   */  public byte[] checkSplit() {    // META表和NAMESPACE元数据表不能被split    // recovering（恢复中）状态的表不能被split    //splitPolicy（split策略）默认为IncreasingToUpperBoundRegionSplitPolicy    if (!splitPolicy.shouldSplit()) {      return null;    }//获取具体的split point    byte[] ret = splitPolicy.getSplitPoint();    if (ret != null) {      try {      //判断row是否在这个region当中        checkRow(ret, "calculated split");      } catch (IOException e) {        LOG.error("Ignoring invalid split", e);        return null;      }    }    return ret;  }

默认splitPolicy为：
IncreasingToUpperBoundRegionSplitPolicy
看一下它里面的shouldSplit()方法

  @Override  protected boolean shouldSplit() {    if (region.shouldForceSplit()) return true;    boolean foundABigStore = false;    // Get count of regions that have the same common table as this.region    // table的region数量    int tableRegionsCount = getCountOfCommonTableRegions();    // Get size to check    // 获取根据hbase.hregion.max.filesize和region数量以及hbase.hregion.memstore.flush.size计算的CheckSize    long sizeToCheck = getSizeToCheck(tableRegionsCount);    //循环遍历region下面所有store    for (Store store : region.getStores().values()) {      // 如果有的region不能被split，比如有的region包含引用文件，则返回false      if ((!store.canSplit())) {        return false;      }      // Mark if any store is big enough      long size = store.getSize();      //如果store大于check size，设置foundABigStore为true      if (size > sizeToCheck) {        LOG.debug("ShouldSplit because " + store.getColumnFamilyName() +          " size=" + size + ", sizeToCheck=" + sizeToCheck +          ", regionsWithCommonTable=" + tableRegionsCount);        foundABigStore = true;      }    }    return foundABigStore;  }

IncreasingToUpperBoundRegionSplitPolicy getSizeToCheck()

  /**   * @return Region max size or <code>count of regions squared * flushsize, which ever is   * smaller; guard against there being zero regions on this server.   */  protected long getSizeToCheck(final int tableRegionsCount) {    // safety check for 100 to avoid numerical overflow in extreme cases    //如果 region数=0或者>100 返回 hbase.hregion.max.filesize 值    //否则 在 max_filesize和 之间选择一个小的值 128M * regionCt * regionCt * regionCt    initialSize= table属性里设置的MEMSTORE_FLUSHSIZE，或者默认为hbase.hregion.memstore.flush.size(默认128M)    return tableRegionsCount == 0 || tableRegionsCount > 100 ? getDesiredMaxFileSize():      Math.min(getDesiredMaxFileSize(),        this.initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount);  }

getDesiredMaxFileSize()方法：
返回IncreasingToUpperBoundRegionSplitPolicy继承的ConstantSizeRegionSplitPolicy类的desiredMaxFileSize值.（hbase.hregion.max.filesize）

desiredMaxFileSize的赋值过程:

  @Override  protected void configureForRegion(HRegion region) {    super.configureForRegion(region);    Configuration conf = getConf();    HTableDescriptor desc = region.getTableDesc();    if (desc != null) {    //如果table设置了MAX_FILESIZE属性，则返回这个属性的值，否则返回-1      this.desiredMaxFileSize = desc.getMaxFileSize();    }    //如果 desc.getMaxFileSize()返回 < 0 的值    //则获取hbase.hregion.max.filesize属性值，或者默认值：10 * 1024 * 1024 * 1024L=10G    if (this.desiredMaxFileSize <= 0) {      this.desiredMaxFileSize = conf.getLong(HConstants.HREGION_MAX_FILESIZE,        HConstants.DEFAULT_MAX_FILE_SIZE);    }  }

上面根据IncreasingToUpperBoundRegionSplitPolicy的shouldSplit()方法判断了：
region数量与max filesize 以及当前region的store中是否包含引用文件等

下面我们继续看HRegion checkSplit()后面执行了什么:
splitPolicy.getSplitPoint()

IncreasingToUpperBoundRegionSplitPolicy getSplitPoint()

//IncreasingToUpperBoundRegionSplitPolicy getSplitPoint()

经过上面的各种check和get mid ，现在终于要执行requestSplit了
CompactSplitThread requestSplit()

  public synchronized void requestSplit(final HRegion r, byte[] midKey) {...//this.splits  线程池，默认线程数为1      this.splits.execute(new SplitRequest(r, midKey, this.server)); ...  }

看看SplitRequest的run()方法吧

  @Override  public void run() {   //判断server是否停止服务    //split metric + 1     long startTime = EnvironmentEdgeManager.currentTime();    SplitTransaction st = new SplitTransaction(parent, midKey);    try {      //获取table的读锁TableLock类型        tableLock.acquire();      } catch (IOException ex) {        tableLock = null;        throw ex;      }      // If prepare does not return true, for some reason -- logged inside in      // the prepare call -- we are not ready to split just now. Just return.      //st.prepare()也是比较重要的一步      if (!st.prepare()) return;      try {        st.execute(this.server, this.server);        success = true;      } catch (Exception e) {        ...        try {          //split失败，进行回滚操作          if (st.rollback(this.server, this.server)) {           ...          } else {           ...          }        } catch (RuntimeException ee) {          ...          this.server.abort(msg + " -- Cause: " + ee.getMessage());        }        return;      }    } catch (IOException ex) {      ...      server.checkFileSystem();    } finally {      //Coprocessor  postCompleteSplit      if (parent.shouldForceSplit()) {        parent.clearSplit();      }      //释放TableLock      releaseTableLock();      // Split succ LOG    }//end finally  }

先看一下prepare():
new了A,B两个HRegionInfo

  /**   * Does checks on split inputs.   * @return <code>true</code> if the region is splittable else   * <code>false</code> if it is not (e.g. its already closed, etc.).   */  public boolean prepare() {    //parent region如果不能被split，则直接return false    //mid不能为null    HRegionInfo hri = this.parent.getRegionInfo();    parent.prepareToSplit();    // Check splitrow.    byte [] startKey = hri.getStartKey();    byte [] endKey = hri.getEndKey();    if (Bytes.equals(startKey, splitrow) ||        !this.parent.getRegionInfo().containsRow(splitrow)) {      LOG.info("Split row is not inside region key range or is equal to " +          "startkey: " + Bytes.toStringBinary(this.splitrow));      return false;    }    //构造regionId，如果构造的regionId小于parent regionId，则自动加1（保证在meta表中的顺序）    long rid = getDaughterRegionIdTimestamp(hri);    //创建A,B两个子region    this.hri_a = new HRegionInfo(hri.getTable(), startKey, this.splitrow, false, rid);    this.hri_b = new HRegionInfo(hri.getTable(), this.splitrow, endKey, false, rid);    this.journal.add(new JournalEntry(JournalEntryType.PREPARED));    return true;  }

看看executor

  /**   * Run the transaction.   * @param server Hosting server instance.  Can be null when testing   * @param services Used to online/offline regions.   * @throws IOException If thrown, transaction failed.   *          Call {@link #rollback(Server, RegionServerServices)}   * @return Regions created   * @throws IOException   * @see #rollback(Server, RegionServerServices)   */  public PairOfSameType<HRegion> execute(final Server server,      final RegionServerServices services)  throws IOException {    useZKForAssignment = server == null ? true :      ConfigUtil.useZKForAssignment(server.getConfiguration());    if (useCoordinatedStateManager(server)) {//状态判断      std =          ((BaseCoordinatedStateManager) server.getCoordinatedStateManager())              .getSplitTransactionCoordination().getDefaultDetails();    }    PairOfSameType<HRegion> regions = createDaughters(server, services);    if (this.parent.getCoprocessorHost() != null) {      this.parent.getCoprocessorHost().preSplitAfterPONR();    }    return stepsAfterPONR(server, services, regions);  }

createDaughters()负责下线parent region 上线子region

  /**   * 准备region和region files   * 参数：services 用来上下线region   * 返回的是创建的region   */  /* package */PairOfSameType<HRegion> createDaughters(final Server server,      final RegionServerServices services) throws IOException {    LOG.info("Starting split of region " + this.parent);    if ((server != null && server.isStopped()) ||        (services != null && services.isStopping())) {      throw new IOException("Server is stopped or stopping");    }    assert !this.parent.lock.writeLock().isHeldByCurrentThread():      "Unsafe to hold write lock while performing RPCs";    journal.add(new JournalEntry(JournalEntryType.BEFORE_PRE_SPLIT_HOOK));    // Coprocessor callback    if (this.parent.getCoprocessorHost() != null) {      // TODO: Remove one of these      this.parent.getCoprocessorHost().preSplit();      this.parent.getCoprocessorHost().preSplit(this.splitrow);    }    journal.add(new JournalEntry(JournalEntryType.AFTER_PRE_SPLIT_HOOK));    // If true, no cluster to write meta edits to or to update znodes in.    boolean testing = server == null? true:        server.getConfiguration().getBoolean("hbase.testing.nocluster", false);    this.fileSplitTimeout = testing ? this.fileSplitTimeout :        server.getConfiguration().getLong("hbase.regionserver.fileSplitTimeout",          this.fileSplitTimeout);    PairOfSameType<HRegion> daughterRegions = stepsBeforePONR(server, services, testing);    List<Mutation> metaEntries = new ArrayList<Mutation>();    if (this.parent.getCoprocessorHost() != null) {      if (this.parent.getCoprocessorHost().          preSplitBeforePONR(this.splitrow, metaEntries)) {        throw new IOException("Coprocessor bypassing region "            + this.parent.getRegionNameAsString() + " split.");      }      try {        for (Mutation p : metaEntries) {          HRegionInfo.parseRegionName(p.getRow());        }      } catch (IOException e) {        LOG.error("Row key of mutation from coprossor is not parsable as region name."            + "Mutations from coprocessor should only for hbase:meta table.");        throw e;      }    }    // This is the point of no return.  Adding subsequent edits to .META. as we    // do below when we do the daughter opens adding each to .META. can fail in    // various interesting ways the most interesting of which is a timeout    // BUT the edits all go through (See HBASE-3872).  IF we reach the PONR    // then subsequent failures need to crash out this regionserver; the    // server shutdown processing should be able to fix-up the incomplete split.    // The offlined parent will have the daughters as extra columns.  If    // we leave the daughter regions in place and do not remove them when we    // crash out, then they will have their references to the parent in place    // still and the server shutdown fixup of .META. will point to these    // regions.    // We should add PONR JournalEntry before offlineParentInMeta,so even if    // OfflineParentInMeta timeout,this will cause regionserver exit,and then    // master ServerShutdownHandler will fix daughter & avoid data loss. (See    // HBase-4562).    this.journal.add(new JournalEntry(JournalEntryType.PONR));    // Edit parent in meta.  Offlines parent region and adds splita and splitb    // as an atomic update. See HBASE-7721. This update to META makes the region    // will determine whether the region is split or not in case of failures.    // If it is successful, master will roll-forward, if not, master will rollback    // and assign the parent region.     //不是测试模式******************    if (!testing && useZKForAssignment) {      if (metaEntries == null || metaEntries.isEmpty()) {        MetaTableAccessor.splitRegion(server.getConnection(),          parent.getRegionInfo(), daughterRegions.getFirst().getRegionInfo(),          daughterRegions.getSecond().getRegionInfo(), server.getServerName(),          parent.getTableDesc().getRegionReplication());      } else {      //元数据的变化  下线parent，并且更新新的region信息        offlineParentInMetaAndputMetaEntries(server.getConnection(),          parent.getRegionInfo(), daughterRegions.getFirst().getRegionInfo(), daughterRegions              .getSecond().getRegionInfo(), server.getServerName(), metaEntries,              parent.getTableDesc().getRegionReplication());      }    } else if (services != null && !useZKForAssignment) {      if (!services.reportRegionStateTransition(TransitionCode.SPLIT_PONR,          parent.getRegionInfo(), hri_a, hri_b)) {        // Passed PONR, let SSH clean it up        throw new IOException("Failed to notify master that split passed PONR: "          + parent.getRegionInfo().getRegionNameAsString());      }    }    return daughterRegions;  }

executor的最后
stepsAfterPONR(server, services, regions) open新region，修改zookeeper里面的信息

/hbase/region-in-transition

0 0