hive与es交互bug

来源:互联网 发布:win7 关闭端口443 编辑:程序博客网 时间:2024/05/21 08:22

hive与es交互bug

一、hive数据写入es,hive查询报错(貌似不能查询)
Bad status for request TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\x8d#e\x89\x0bhBg\xb9\xdb\xc7L\xe7lb\xb0',  guid="X\xee.\x81\xd8'Hy\x983\xb7\x00\xcb\x85\x84\x91")),  orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=None,  errorMessage='java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.hadoop.util.Version',  sqlState=None, infoMessages=['*java.lang.RuntimeException:java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.hadoop.util.Version:19:18',  'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:83',  'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36',  'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63',  'java.security.AccessController:doPrivileged:AccessController.java:-2',  'javax.security.auth.Subject:doAs:Subject.java:415', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1783',  'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy27:fetchResults::-1',  'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:440',  'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:686', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538',  'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',  'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',  'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286',  'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',  'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',  'java.lang.Thread:run:Thread.java:745', '*java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.hadoop.util.Version:35:16',  'org.elasticsearch.hadoop.rest.RestService:findPartitions:RestService.java:225',  'org.elasticsearch.hadoop.mr.EsInputFormat:getSplits:EsInputFormat.java:457', 'org.elasticsearch.hadoop.hive.EsHiveInputFormat:getSplits:EsHiveInputFormat.java:111',  'org.elasticsearch.hadoop.hive.EsHiveInputFormat:getSplits:EsHiveInputFormat.java:50',  'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextSplits:FetchOperator.java:363',  'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:295', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:446', 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:415',  'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:138',  'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:1987', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:361', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:277', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:753',  'sun.reflect.GeneratedMethodAccessor12:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:606', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78'], statusCode=3), results=None, hasMoreRows=None)
我以为是版本问题
应该是hive写入es之后不能进行查询,
映射es原本数据才可以查询
官网原话:As one can note, currently the reading and writing are treated separately but we're working on unifying the two and automatically translating HiveQL to Elasticsearch queries.

二、数据写入时报错
无法检测ES版本 - 通常情况下,如果网络/ Elasticsearch发生这种情况
集群不可访问,或者在没有正确设置“es.nodes.wan.only”的情况下定位WAN /云实例时
参数未设置??关于版本问题,我的 es-2.4.4 ,es-hadoop2.4.4.jar   hadoop 2.6.0(cdh版)。版本问题官网上,我只找到具体es哪个版本,es-hadoop哪个版本,却没有说明hadoop哪个版本对应es哪个版本,所以默认是没要求??
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"_col0":63818992,"_col1":"陶悦","_col2":"18716402326","_col3":"201710260063961961","_ccol29":"F","_col30":null8l} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing  row {"_col0":63818992,"_col1":"陶悦","_col2":"18716402326","_col3":"201710260063961961","_col6":"id"}  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: Unexpected exception: Unexpected exception: org.apache.hadoop.hive.ql.metadata.HiveException: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException:  Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud  instance without the proper setting 'es.nodes.wan.only' at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)  at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) ... 9 more  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: Unexpected exception: org.apache.hadoop.hive.ql.metadata.HiveException: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch  cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'  at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)  at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:638)  at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:670) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:748)  at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:306) ... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: org.apache.hadoop.hive.ql.metadata.HiveException:  org.elasticsearch.hadoop.EsHadoopIllegalArgumentException:  Cannot detect ES version - typically this happens if the network/Elasticsearch  cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'  at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)  at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:638)  at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:670)  at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:748) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:306) ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'  at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:525)  at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:623) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)  at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)  at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:638)  at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:670)  at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:748)  at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:306) ... 23 more  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch  cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only' at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)  at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:570)  at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:514) ... 31 more 
建表时参数未指定
'es.nodes' = '192.68.20.10:9201,12.168.00.110:02,192.18.200.12:923',
'es.index.auto.create' = 'true',--自动创建es索引
'es.resource' = 'es_bigtable/bigtable_list', --索引名称及类型
'es.nodes.wan.only'='true',--连接器是否针对广域网上的云/受限环境(例如Amazon Web Services)中的Elasticsearch实例使用。在此模式下,连接器将禁用发现,并且es.nodes在所有操作(包括读取和写入操作)期间通过声明进行连接请注意,在这种模式下,性能受到很大 影响。
'es.mapping.names' =‘’--字段映射
参数参考官网:https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

三、hadoop与es写入速度不一致??
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) ... 9 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: Unexpected exception: Could not write all entries [94/1047616] (maybe ES was overloaded?). Bailing out... at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:638)at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:670) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:748)at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:306) ... 13 more 
源码:
    public void flush() {        BulkResponse bulk = tryFlush();        if (!bulk.getLeftovers().isEmpty()) {            String header = String.format("Could not write all entries [%s/%s] (Maybe ES was overloaded?). Error sample (first [%s] error messages):\n", bulk.getLeftovers().cardinality(), bulk.getTotalWrites(), bulk.getErrorExamples().size());            StringBuilder message = new StringBuilder(header);            for (String errors : bulk.getErrorExamples()) {                message.append("\t").append(errors).append("\n");            }            message.append("Bailing out...");            throw new EsHadoopException(message.toString());        }    }

 If you configure your hive query to use a combined input format to lower the number of splits on the job then that would give ES larger and fewer batches of records, and fill up its task queue less frequently.

set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
这个链接更好的回复了这个问题:https://discuss.elastic.co/t/pushback-to-hadoop-from-es-on-bulk-load/1535/5
四、索引只能小写
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:  Found unrecoverable error [192.168.200.100:9201] returned Bad Request(400) - Invalid index name [Lots_scenic], must be lowercase; 


原创粉丝点击