Hive-0.9 升级 Hive-0.12后结合HBase统计遇到的BUG处理

来源：互联网发布：山西大学教务网络编辑：程序博客网时间：2024/04/30 13:43

1. 空指针异常

问题分析：最初以为是配置文件中缺损了某个值，然后用vimdiff比对了hive0.9和hive0.12之间的配置差异，在排除了配置文件出错的可能性后，下载了源码来看

问题解决：主要还是按照这个wiki上的来解决https://issues.apache.org/jira/browse/HIVE-5515

2. Map重复读Hbase

这个问题其实由来已久，其实我看到最早在hive0.9就已经有了，太坑爹了！

问题分析：修改源码打成新jar包后，通过tasklog可以发现，每个map的startRow和endRow竟然是一样的，Hbase的数据被重复scan，直接会造成reduce的最终结果是真实值的map倍，故猜测BUG应该是 map切片Hbase的时候出错了

问题解决：在HiveHBaseTableInputFormat.java中getRecordReader方法里注释掉

//      tableSplit = convertFilter(jobConf, scan, tableSplit, iKey,//        getStorageFormatOfKey(columnsMapping.get(iKey).mappingSpec,//        jobConf.get(HBaseSerDe.HBASE_TABLE_DEFAULT_STORAGE_TYPE, "string")));

然后bulid.xml那边添加

  <target name="jar-hbase-handler" depends="init">    <subant buildpath="hbase-handler/build.xml" target="jar">      <property name="is-offline" value="${is-offline}"/>      <property name="thrift.home" value="${thrift.home}"/>      <property name="build.dir.hive" location="${build.dir.hive}"/>    </subant>  </target>

接着仅打包这个就可以了。
后来在hive wiki上提交这个BUG后，收到这个问题的解决回复：https://issues.apache.org/jira/browse/HIVE-3420

3.hive并发提交任务，出现数据混读

由于项目刚上线启动后会马上同时触发提交好几个hive产生的Mapreduce任务，结果跑完后发现这几个hive任务竟然都读了同一个hive任务要scan hbase的数据。

问题分析：猜测应该是某个全部变量或者某个单例引起的错误，而且map的时候就发生了，故还是锁定在HiveHBaseTableInputFormat.java。

增加打印堆栈信息，添加在getSplits函数return前面：

      new Exception("test hive:" + System.identityHashCode(this)).printStackTrace();      System.out.println("this:" + System.identityHashCode(this) + ", conf:" + System.identityHashCode(jobConf));      System.out.println("TableSplits:" + Arrays.asList(splits));

编译打包，替换线上jar包，重启hiveserver，执行并发提交

结果发现HiveHBaseTableInputFormat其实是个单例，conf是不同：

java.lang.Exception: test hive        at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.<init>(HiveHBaseTableInputFormat.java:86)        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:194)        at org.apache.hadoop.hive.ql.exec.Utilities$3.run(Utilities.java:1940)        at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1962)        at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.setNumberOfReducers(MapRedTask.java:409)        at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:99)        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)        at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:55)java.lang.Exception: test hive:1051896348        at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:527)        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:294)        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:303)        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:396)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)        at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)        at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)        at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:55)job name:INSERT OVERWRITE TABLE hb_jt_s...subappad_id(Stage-0) startRow:subappad|79868893999999999 endRow:subappad|80308768999999999job name:INSERT OVERWRITE TABLE hb_jt_stat...mobad_id(Stage-0) startRow:mobad|79868893999999999 endRow:mobad|80308768999999999job name:INSERT OVERWRITE TABLE hb_jt_statistic...tag(Stage-0) startRow:sub2main|79868893999999999 endRow:sub2main|80308768999999999this:1620951483, conf:828005119TableSplits:[[storage2.test.lan:sub2main|79868893999999999\x00,sub2main|80308768999999999\x00]]this:1620951483, conf:588002473TableSplits:[[storage2.test.lan:sub2main|79868893999999999\x00,sub2main|80308768999999999\x00]]this:1620951483, conf:407508952TableSplits:[[storage2.test.lan:sub2main|79868893999999999\x00,sub2main|80308768999999999\x00]]

简单的一个修正方法：

创建HiveHBaseTableInputFormatRealExe.java，copy HiveHBaseTableInputFormat.java的全部，然后原来的HiveHBaseTableInputFormat.java简化成一个空壳调用。

public class HiveHBaseTableInputFormat implements InputFormat<ImmutableBytesWritable, Result> {  @Override  public RecordReader<ImmutableBytesWritable, Result> getRecordReader(InputSplit split,      JobConf jobConf, final Reporter reporter) throws IOException {    HiveHBaseTableInputFormatRealExecute exe = new HiveHBaseTableInputFormatRealExecute();    return exe.getRecordReader(split, jobConf, reporter);  }  @Override  public InputSplit[] getSplits(JobConf jobConf, int numSplits) throws IOException {    HiveHBaseTableInputFormatRealExecute exe = new HiveHBaseTableInputFormatRealExecute();    return exe.getSplits(jobConf, numSplits);  }}

由于HiveHBaseTableInputFormat.java里有个static方法被外部调用了，稍微修改下就可修复。

PS: 我是hive的初学者，有什么不对的地方可以联系我