Hive Metastore ObjectStore PersistenceManager自动关闭bug解析

来源：互联网发布：北京php程序员工资编辑：程序博客网时间：2024/06/05 18:14

最近在测试HCatalog，由于Hcatalog本身就是一个独立JAR包，虽然它也可以运行service，但是其实这个service就是metastore thrift server，我们在写基于Hcatalog的mapreduce job时候只要把hcatalog JAR包和对应的hive-site.xml文件加入libjars和HADOOP_CLASSPATH中就可以了。不过在测试的时候还是遇到了一些问题，hive metastore server在运行了一段时间后会抛如下错误

2013-06-19 10:35:51,718 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message.javax.jdo.JDOFatalUserException: Persistence Manager has been closed        at org.datanucleus.jdo.JDOPersistenceManager.assertIsOpen(JDOPersistenceManager.java:2124)        at org.datanucleus.jdo.JDOPersistenceManager.currentTransaction(JDOPersistenceManager.java:315)        at org.apache.hadoop.hive.metastore.ObjectStore.openTransaction(ObjectStore.java:294)        at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:732)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)        at com.sun.proxy.$Proxy5.getTable(Unknown Source)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:982)        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table.getResult(ThriftHiveMetastore.java:5017)        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table.getResult(ThriftHiveMetastore.java:5005)        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)

其中PersistenceManager负责控制一组持久化对象包括创建持久化对象和查询对象，它是ObjectStore的一个实例变量，每个ObjectStore拥有一个pm，RawStore是metastore逻辑层和物理底层元数据库（比如derby）交互的接口类，ObjectStore是RawStore的默认实现类。Hive Metastore Server启动的时候会指定一个TProcessor，包装了一个HMSHandler，内部有一个ThreadLocal<RawStore> threadLocalMS实例变量，每个thread维护一个RawStore

    private final ThreadLocal<RawStore> threadLocalMS =      new ThreadLocal<RawStore>() {        @Override        protected synchronized RawStore initialValue() {          return null;        }      };

每一个从hive metastore client过来的请求都会从线程池中分配一个WorkerProcess来处理，在HMSHandler中每一个方法都会通过getMS()获取rawstore instance来做具体操作

    public RawStore getMS() throws MetaException {      RawStore ms = threadLocalMS.get();      if (ms == null) {        ms = newRawStore();        threadLocalMS.set(ms);        ms = threadLocalMS.get();      }      return ms;    }

看得出来RawStore是延迟加载，初始化后绑定到threadlocal变量中可以为以后复用

    private RawStore newRawStore() throws MetaException {      LOG.info(addPrefix("Opening raw store with implemenation class:"          + rawStoreClassName));      Configuration conf = getConf();      return RetryingRawStore.getProxy(hiveConf, conf, rawStoreClassName, threadLocalId.get());    }

RawStore使用了动态代理模式(继承InvocationHandler接口)，内部实现了invoke函数，通过method.invoke()执行真正的逻辑，这样的好处是可以在method.invoke()上下文中添加自己其他的逻辑，RetryingRawStore就是在通过捕捉invoke函数抛出的异常，来达到重试的效果。由于使用reflection机制，异常是wrap在InvocationTargetException中的，不过在hive 0.9中竟然在捕捉到此异常后直接throw出来了，而不是retry，明显不对啊。我对它修改了下，拿出wrap的target exception，判断是不是instance of jdoexception的，再做相应的处理

  @Override  public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {    Object ret = null;    boolean gotNewConnectUrl = false;    boolean reloadConf = HiveConf.getBoolVar(hiveConf,        HiveConf.ConfVars.METASTOREFORCERELOADCONF);    boolean reloadConfOnJdoException = false;    if (reloadConf) {      updateConnectionURL(getConf(), null);    }    int retryCount = 0;    Exception caughtException = null;    while (true) {      try {        if (reloadConf || gotNewConnectUrl || reloadConfOnJdoException) {          initMS();        }        ret = method.invoke(base, args);        break;      } catch (javax.jdo.JDOException e) {        caughtException = (javax.jdo.JDOException) e.getCause();      } catch (UndeclaredThrowableException e) {        throw e.getCause();      } catch (InvocationTargetException e) {        Throwable t = e.getTargetException();        if (t instanceof JDOException){          caughtException = (JDOException) e.getTargetException();          reloadConfOnJdoException = true;          LOG.error("rawstore jdoexception:" + caughtException.toString());        }else {            throw e.getCause();        }      }      if (retryCount >= retryLimit) {        throw caughtException;      }      assert (retryInterval >= 0);      retryCount++;      LOG.error(          String.format(              "JDO datastore error. Retrying metastore command " +                  "after %d ms (attempt %d of %d)", retryInterval, retryCount, retryLimit));      Thread.sleep(retryInterval);      // If we have a connection error, the JDO connection URL hook might      // provide us with a new URL to access the datastore.      String lastUrl = getConnectionURL(getConf());      gotNewConnectUrl = updateConnectionURL(getConf(), lastUrl);    }    return ret;  }

初始化RawStore有两种方式，一种是在RetryingRawStore的构造函数中调用"this.base = (RawStore) ReflectionUtils.newInstance(rawStoreClass, conf);" 因为ObjectStore实现了Configurable，在newInstance方法中主动调用里面的setConf(conf)方法初始化RawStore，还有一种情况是在捕捉到异常后retry，也会调用base.setConf(getConf());

private void initMS() {    base.setConf(getConf());  }

ObjectStore的setConf方法中，先将PersistenceManagerFactory锁住，pm close掉，设置成NULL，再初始化pm

public void setConf(Configuration conf) {    // Although an instance of ObjectStore is accessed by one thread, there may    // be many threads with ObjectStore instances. So the static variables    // pmf and prop need to be protected with locks.    pmfPropLock.lock();    try {      isInitialized = false;      hiveConf = conf;      Properties propsFromConf = getDataSourceProps(conf);      boolean propsChanged = !propsFromConf.equals(prop);      if (propsChanged) {        pmf = null;        prop = null;      }      assert(!isActiveTransaction());      shutdown();      // Always want to re-create pm as we don't know if it were created by the      // most recent instance of the pmf      pm = null;      openTrasactionCalls = 0;      currentTransaction = null;      transactionStatus = TXN_STATUS.NO_STATE;      initialize(propsFromConf);      if (!isInitialized) {        throw new RuntimeException(        "Unable to create persistence manager. Check dss.log for details");      } else {        LOG.info("Initialized ObjectStore");      }    } finally {      pmfPropLock.unlock();    }  }

private void initialize(Properties dsProps) {    LOG.info("ObjectStore, initialize called");    prop = dsProps;    pm = getPersistenceManager();    isInitialized = pm != null;    return;  }

回到一开始报错的那段信息，怎么会Persistence Manager会被关闭呢，仔细排查后才发现是由于HCatalog使用HiveMetastoreClient用完后主动调用了close方法，而一般Hive里面内部不会调这个方法.

HiveMetaStoreClient.java

public void close() {    isConnected = false;    try {      if (null != client) {        client.shutdown();      }    } catch (TException e) {      LOG.error("Unable to shutdown local metastore client", e);    }    // Transport would have got closed via client.shutdown(), so we dont need this, but    // just in case, we make this call.    if ((transport != null) && transport.isOpen()) {      transport.close();    }  }

对应server端HMSHandler中的shutdown方法

@Override    public void shutdown() {      logInfo("Shutting down the object store...");      RawStore ms = threadLocalMS.get();      if (ms != null) {        ms.shutdown();        ms = null;      }      logInfo("Metastore shutdown complete.");    }

ObjectStore的shutdown方法

public void shutdown() {    if (pm != null) {      pm.close();    }  }

我们看到shutdown方法里面只是把当前thread的ObjectStore拿出来后，做了一个ObjectStore shutdown方法，把pm关闭了。但是并没有把ObjectStore销毁掉，它还是存在于threadLocalMS中，下次还是会被拿出来，下一次这个thread服务于另外一个请求的时候又会被get出ObjectSture来，但是由于里面的pm已经close掉了所以肯定抛异常。正确的做法是应该加上threadLocalMS.remove()或者threadLocalMS.set(null)，主动将其从ThreadLocalMap中删除。

修改后的shutdown方法

public void shutdown() {      logInfo("Shutting down the object store...");      RawStore ms = threadLocalMS.get();      if (ms != null) {        ms.shutdown();        ms = null;        threadLocalMS.remove();      }      logInfo("Metastore shutdown complete.");    }

改好后重启metastore server，再也没有碰到Persistence Manager报已经close的情况了

本文链接http://blog.csdn.net/lalaguozhe/article/details/9161799，转载请注明