HBase Table Enabling issue

来源:互联网 发布:mysql常用指令 编辑:程序博客网 时间:2024/06/03 17:37

有的时候HBase的表在enable的过程中出现问题,导致最终状态处于中间状态,而这个时候客户端又没有办法使用这个表。这里分析一下一个case

env

hbase 0.98.0.2.1.3.7-18-hadoop2
版本比较老

Issue

如下的表的状态就处于中间状态

hbase(main):011:0> is_enabled 'prod_eglesprk_tasks'false0 row(s) in 0.0680 secondshbase(main):012:0> is_disabled 'prod_eglesprk_tasks'false0 row(s) in 0.0510 seconds

从页面上看是这样的

这里写图片描述

表的状态是什么

通过如下代码可以查询出来当前处于enabling状态的表的名字,这段代码源自于hmaster如何查询table状态的代码

可以确认这个表处于enabling状态

Configuration conf = HBaseConfiguration.create();conf.set("ha.zookeeper.quorum", "127.0.0.1:2181");String node = "/hbase/testSecuritySingleSuperuser";ZooKeeperWatcher watcher = new ZooKeeperWatcher(conf, node, null, false);Set<TableName> disablingOrDisabled = ZKTable.getEnablingTables(watcher);for (TableName tb : disablingOrDisabled){    System.out.println(tb.getNameAsString());}

核心pom.xml

    <dependency>        <groupId>org.apache.zookeeper</groupId>        <artifactId>zookeeper</artifactId>        <version>3.4.5</version>        <exclusions>            <exclusion>                <groupId>com.sun.jmx</groupId>                <artifactId>jmxri</artifactId>            </exclusion>            <exclusion>                <groupId>com.sun.jdmk</groupId>                <artifactId>jmxtools</artifactId>            </exclusion>            <exclusion>                <groupId>javax.jms</groupId>                <artifactId>jms</artifactId>            </exclusion>        </exclusions>    </dependency>    <dependency>        <groupId>org.slf4j</groupId>        <artifactId>slf4j-log4j12</artifactId>        <version>1.6.4</version>    </dependency>      <dependency>          <groupId>org.apache.hbase</groupId>          <artifactId>hbase-common</artifactId>          <version>0.98.0.2.1.3.0-563-hadoop2</version>      </dependency>      <dependency>          <groupId>org.apache.hbase</groupId>          <artifactId>hbase-protocol</artifactId>          <version>0.98.0.2.1.3.0-563-hadoop2</version>      </dependency>      <dependency>          <groupId>com.google.protobuf</groupId>          <artifactId>protobuf-java</artifactId>          <version>2.5.0</version>      </dependency>      <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->      <dependency>          <groupId>org.apache.hbase</groupId>          <artifactId>hbase-client</artifactId>          <version>0.98.4-hadoop2</version>      </dependency>

Root Cause

HBase Master Log
这里显示在bulk阶段,超时了,导致bulk assign没有成功。
这里有两点

  1. enable没有成功的原因
  2. enabling状态,hmaster不再接受新的对这张表的enable操作,这样防止多个client同时进行enable操作导致的混乱
2017-05-22 03:57:14,363 DEBUG [MASTER_TABLE_OPERATIONS-des0002:60000-0] master.GeneralBulkAssigner: bulk assigning total 618 regions to 6 servers, took 367089ms, with 40 regions still in transition2017-05-22 03:57:14,368 WARN  [MASTER_TABLE_OPERATIONS-des0002:60000-0] handler.EnableTableHandler: Table 'prod_eglesprk_tasks' wasn't successfully enabled. Status: done=false2017-05-22 03:57:14,395 DEBUG [MASTER_TABLE_OPERATIONS-lvsapdes0002:60000-0] lock.ZKInterProcessLockBase: Released /hbase/table-lock/prod_eglesprk_tasks/write-master:600000000001168hbase-hbase-master-lvsapdes0002.stratus.lvs.ebay.com.log.2:2017-05-22 03:59:30,349 INFO  [FifoRpcScheduler.handler1-thread-7] master.HMaster: Client=hbase//10.8.146.63 enable prod_eglesprk_taskshbase-hbase-master-lvsapdes0002.stratus.lvs.ebay.com.log.2:2017-05-22 03:59:30,369 INFO  [FifoRpcScheduler.handler1-thread-7] handler.EnableTableHandler: Table prod_eglesprk_tasks isn't disabled; skipping enable

对应的代码

HMaster.java

  @Override  public long enableTable(final TableName tableName, final long nonceGroup, final long nonce)      throws IOException {    checkInitialized();    if (isMasterProcedureExecutorEnabled()) {      return MasterProcedureUtil.submitProcedure(        new MasterProcedureUtil.NonceProcedureRunnable(this, nonceGroup, nonce) {        @Override        protected void run() throws IOException {          getMaster().getMasterCoprocessorHost().preEnableTable(tableName);          LOG.info(getClientIdAuditPrefix() + " enable " + tableName);          // Execute the operation asynchronously - client will check the progress of the operation          // In case the request is from a <1.1 client before returning,          // we want to make sure that the table is prepared to be          // enabled (the table is locked and the table state is set).          // Note: if the procedure throws exception, we will catch it and rethrow.          final ProcedurePrepareLatch prepareLatch = ProcedurePrepareLatch.createLatch();          submitProcedure(new EnableTableProcedure(            procedureExecutor.getEnvironment(), tableName, false, prepareLatch));          // Before returning to client, we want to make sure that the table is prepared to be          // enabled (the table is locked and the table state is set).          //          // Note: if the procedure throws exception, we will catch it and rethrow.          prepareLatch.await();          getMaster().getMasterCoprocessorHost().postEnableTable(tableName);        }        @Override        protected String getDescription() {          return "EnableTableProcedure";        }      });    } else {      if (cpHost != null) {        cpHost.preEnableTable(tableName);      }      LOG.info(getClientIdAuditPrefix() + " enable " + tableName);      this.service.submit(new EnableTableHandler(this, tableName,        assignmentManager, tableLockManager, false).prepare());      if (cpHost != null) {        cpHost.postEnableTable(tableName);      }      return -1;    }  }

EnableTableHandler.java

  private void handleEnableTable() throws IOException, CoordinatedStateException,      InterruptedException {      BulkAssigner ba =          new GeneralBulkAssigner(this.server, bulkPlan, this.assignmentManager, true);      try {        if (ba.bulkAssign()) {          done = true;        }      } catch (InterruptedException e) {        LOG.warn("Enable operation was interrupted when enabling table '"            + this.tableName + "'");        // Preserve the interrupt.        Thread.currentThread().interrupt();      }    } else {      done = true;      LOG.info("Balancer was unable to find suitable servers for table " + tableName          + ", leaving unassigned");    }    if (done) {      // Flip the table to enabled.      this.assignmentManager.getTableStateManager().setTableState(        this.tableName, ZooKeeperProtos.Table.State.ENABLED);      LOG.info("Table '" + this.tableName      + "' was successfully enabled. Status: done=" + done);    } else {      LOG.warn("Table '" + this.tableName      + "' wasn't successfully enabled. Status: done=" + done);    }  }

BulkAssigner.java这里在做assign region的时候有个timer

  public boolean bulkAssign(boolean sync) throws InterruptedException,      IOException {    boolean result = false;    ThreadFactoryBuilder builder = new ThreadFactoryBuilder();    builder.setDaemon(true);    builder.setNameFormat(getThreadNamePrefix() + "-%1$d");    builder.setUncaughtExceptionHandler(getUncaughtExceptionHandler());    int threadCount = getThreadCount();    java.util.concurrent.ExecutorService pool =      Executors.newFixedThreadPool(threadCount, builder.build());    try {      populatePool(pool);      // How long to wait on empty regions-in-transition.  If we timeout, the      // RIT monitor should do fixup.      if (sync) result = waitUntilDone(getTimeoutOnRIT());    } finally {      // We're done with the pool.  It'll exit when its done all in queue.      pool.shutdown();    }    return result;  }

解决方案

参考了下面这个jira的内容,找到了解决方案,那就是重启HBase Master。
从如下代码可以看到,重启过程,会把enabling状态的表设置为enabled状态。
HBASE-6469

AssignmentManager.java

/**   * Recover the tables that are not fully moved to ENABLED state. These tables   * are in ENABLING state when the master restarted/switched   *   * @throws KeeperException   * @throws org.apache.hadoop.hbase.TableNotFoundException   * @throws IOException   */private void recoverTableInEnablingState()      throws KeeperException, TableNotFoundException, IOException {    Set<TableName> enablingTables = ZKTable.getEnablingTables(watcher);    if (enablingTables.size() != 0) {      for (TableName tableName : enablingTables) {        // Recover by calling EnableTableHandler        LOG.info("The table " + tableName            + " is in ENABLING state.  Hence recovering by moving the table"            + " to ENABLED state.");        // enableTable in sync way during master startup,        // no need to invoke coprocessor        EnableTableHandler eth = new EnableTableHandler(this.server, tableName,          catalogTracker, this, tableLockManager, true);        try {          eth.prepare();        } catch (TableNotFoundException e) {          LOG.warn("Table " + tableName + " not found in hbase:meta to recover.");          continue;        }        eth.process();      }    }  }
原创粉丝点击