spark-thrift-server 执行spark-sql 的OOM GC异常.

来源:互联网 发布:淘宝网鞋子女鞋春季 编辑:程序博客网 时间:2024/05/22 03:09

   场景 :  在前端页面提交一个spark-sql,然后通过spark-thrift-server 调用来执行. sql逻辑很简单, 就是使用join关联两表(一个大表90G,一个小表3G)查询,前台界面执行出错.后台抛出java.lang.OutOfMemoryError: GC overhead limit exceeded 异常;

               直接使用./spark-sql --master yarn --executor-memory 4G --num-executors 19 启动spark-sql 在后台执行成功,说明sql语句没有问题.

查看后台spark日志,发现抛出java.lang.OutOfMemoryError: GC overhead limit exceeded 异常.详细异常如下:

Exception in thread "HiveServer2-Handler-Pool: Thread-199" 16/10/12 17:52:27 WARN NioEventLoop: Unexpected exception in the selector loop.16/10/12 17:52:27 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 15Exception in thread "HiveServer2-Handler-Pool: Thread-167" java.lang.OutOfMemoryError: GC overhead limit exceededat java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)at java.lang.StringCoding.decode(StringCoding.java:193)at java.lang.String.<init>(String.java:416)at java.lang.String.<init>(String.java:481)at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:381)at org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:374)at org.apache.hive.service.cli.thrift.TGetTablesReq$TGetTablesReqStandardScheme.read(TGetTablesReq.java:697)at org.apache.hive.service.cli.thrift.TGetTablesReq$TGetTablesReqStandardScheme.read(TGetTablesReq.java:666)at org.apache.hive.service.cli.thrift.TGetTablesReq.read(TGetTablesReq.java:569)at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args$GetTables_argsStandardScheme.read(TCLIService.java:7000)at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args$GetTables_argsStandardScheme.read(TCLIService.java:6985)at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args.read(TCLIService.java:6932)at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:25)at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)Exception in thread "HiveServer2-Handler-Pool: Thread-178" 16/10/12 17:52:57 WARN SingleThreadEventExecutor: Unexpected exception from an event executor: java.lang.OutOfMemoryError: GC overhead limit exceededjava.lang.OutOfMemoryError: GC overhead limit exceededat org.apache.thrift.protocol.TBinaryProtocol.readFieldBegin(TBinaryProtocol.java:245)at org.apache.hive.service.cli.thrift.THandleIdentifier$THandleIdentifierStandardScheme.read(THandleIdentifier.java:430)at org.apache.hive.service.cli.thrift.THandleIdentifier$THandleIdentifierStandardScheme.read(THandleIdentifier.java:423)at org.apache.hive.service.cli.thrift.THandleIdentifier.read(THandleIdentifier.java:357)at org.apache.hive.service.cli.thrift.TSessionHandle$TSessionHandleStandardScheme.read(TSessionHandle.java:336)at org.apache.hive.service.cli.thrift.TSessionHandle$TSessionHandleStandardScheme.read(TSessionHandle.java:321)at org.apache.hive.service.cli.thrift.TSessionHandle.read(TSessionHandle.java:264)at org.apache.hive.service.cli.thrift.TGetTablesReq$TGetTablesReqStandardScheme.read(TGetTablesReq.java:681)at org.apache.hive.service.cli.thrift.TGetTablesReq$TGetTablesReqStandardScheme.read(TGetTablesReq.java:666)at org.apache.hive.service.cli.thrift.TGetTablesReq.read(TGetTablesReq.java:569)at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args$GetTables_argsStandardScheme.read(TCLIService.java:7000)at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args$GetTables_argsStandardScheme.read(TCLIService.java:6985)at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args.read(TCLIService.java:6932)at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:25)at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)16/10/12 17:53:22 INFO HiveMetaStore: 26: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 8: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 15: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 18: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 28: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 WARN HeartbeatReceiver: Removing executor 15 with no recent heartbeats: 148218 ms exceeds timeout 120000 ms16/10/12 17:53:22 ERROR YarnScheduler: Lost executor 15 on node18.it.leap.com: Executor heartbeat timed out after 148218 ms16/10/12 17:53:22 INFO DAGScheduler: Executor lost: 15 (epoch 4)16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Trying to remove executor 15 from BlockManagerMaster.16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(15, node18.it.leap.com, 62536)16/10/12 17:53:22 WARN HeartbeatReceiver: Removing executor 16 with no recent heartbeats: 147212 ms exceeds timeout 120000 ms16/10/12 17:53:22 ERROR YarnScheduler: Lost executor 16 on node18.it.leap.com: Executor heartbeat timed out after 147212 msjava.lang.OutOfMemoryError: GC overhead limit exceeded16/10/12 17:53:22 INFO HiveMetaStore: 4: get_databases: *16/10/12 17:53:22 WARN NioEventLoop: Unexpected exception in the selector loop.java.lang.OutOfMemoryError: GC overhead limit exceeded16/10/12 17:53:22 INFO HiveMetaStore: 16: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 3: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 29: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 29: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore16/10/12 17:53:22 INFO ObjectStore: ObjectStore, initialize called16/10/12 17:53:22 INFO BlockManagerMaster: Removed 15 successfully in removeExecutor16/10/12 17:53:22 INFO DAGScheduler: Executor lost: 16 (epoch 4)16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Trying to remove executor 16 from BlockManagerMaster.16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(16, node18.it.leap.com, 41766)16/10/12 17:53:22 INFO BlockManagerMaster: Removed 16 successfully in removeExecutor16/10/12 17:53:22 INFO HiveMetaStore: 1: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 10: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 17: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 21: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 14: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 19: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 27: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 20: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO DAGScheduler: Host added was in lost list earlier: node18.it.leap.comException in thread "broadcast-exchange-0" 16/10/12 17:53:22 ERROR TThreadPoolServer: ExecutorService threw error: java.lang.OutOfMemoryError: GC overhead limit exceededjava.lang.OutOfMemoryError: GC overhead limit exceededjava.lang.OutOfMemoryError: GC overhead limit exceeded16/10/12 17:53:22 ERROR ThriftCLIService: Error starting HiveServer2: could not start ThriftBinaryCLIServicejava.lang.OutOfMemoryError: GC overhead limit exceededException in thread "HiveServer2-Handler-Pool: Thread-159" java.lang.OutOfMemoryError: GC overhead limit exceededat java.lang.StringBuilder.toString(StringBuilder.java:405)at javax.security.sasl.Sasl.createSaslServer(Sasl.java:499)at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:140)at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)Exception in thread "HiveServer2-Handler-Pool: Thread-149" java.lang.OutOfMemoryError: GC overhead limit exceeded16/10/12 17:53:22 INFO HiveMetaStore: 26: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 26: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO HiveServer2: Shutting down HiveServer216/10/12 17:53:22 INFO HiveMetaStore: 20: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 10: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 10: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO ServerConnector: Stopped ServerConnector@624d190f{HTTP/1.1}{0.0.0.0:4040}16/10/12 17:53:22 INFO HiveMetaStore: 15: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 4: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 16: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 18: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Registering block manager node18.it.leap.com:62536 with 408.9 MB RAM, BlockManagerId(15, node18.it.leap.com, 62536)16/10/12 17:53:22 INFO HiveMetaStore: 4: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO HiveMetaStore: 16: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO HiveMetaStore: 18: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5ae5d5ca{/stages/stage/kill,null,UNAVAILABLE}16/10/12 17:53:22 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@cb37961{/api,null,UNAVAILABLE}16/10/12 17:53:22 INFO HiveMetaStore: 4: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO HiveMetaStore: 16: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO HiveMetaStore: 18: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Registering block manager node18.it.leap.com:41766 with 408.9 MB RAM, BlockManagerId(16, node18.it.leap.com, 41766)16/10/12 17:53:22 INFO HiveMetaStore: 28: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 21: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 3: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO ThriftCLIService: Thrift server has stopped16/10/12 17:53:22 INFO AbstractService: Service:ThriftBinaryCLIService is stopped.16/10/12 17:53:22 INFO HiveMetaStore: 17: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 26: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO HiveMetaStore: 27: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO HiveMetaStore: 20: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO HiveMetaStore: 1: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO HiveMetaStore: 14: get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 19: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO HiveMetaStore: 17: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO AbstractService: Service:OperationManager is stopped.16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:22 INFO HiveMetaStore: 10: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing16/10/12 17:53:22 INFO HiveMetaStore: 3: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO HiveMetaStore: 21: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO HiveMetaStore: 28: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@61ad39{/,null,UNAVAILABLE}16/10/12 17:53:22 INFO HiveMetaStore: 8: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY16/10/12 17:53:22 INFO HiveMetaStore: 17: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO HiveMetaStore: 21: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO HiveMetaStore: 15: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO HiveMetaStore: 28: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO HiveMetaStore: 20: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO AbstractService: Service:SessionManager is stopped.16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO HiveMetaStore: 19: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO HiveMetaStore: 14: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO HiveMetaStore: 15: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO HiveMetaStore: 3: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO HiveMetaStore: 19: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO HiveMetaStore: 14: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO HiveMetaStore: 27: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO ObjectStore: Initialized ObjectStore16/10/12 17:53:22 INFO HiveMetaStore: 1: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO HiveMetaStore: 1: get_multi_table : db=default tbls=16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:22 INFO HiveMetaStore: 29: get_databases: *16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_databases: *16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO HiveMetaStore: 29: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@644b20f8{/static,null,UNAVAILABLE}16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO HiveMetaStore: 8: get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:22 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:23 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_tables: db=default pat=PROBABLYNOT16/10/12 17:53:22 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1355c642{/executors/threadDump/json,null,UNAVAILABLE}16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7aa3bb2d{/executors/threadDump,null,UNAVAILABLE}16/10/12 17:53:23 INFO ExecutorAllocationManager: Removing executor 15 because it has been idle for 60 seconds (new desired total will be 1)16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@34659b02{/executors/json,null,UNAVAILABLE}16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@22c11d39{/executors,null,UNAVAILABLE}16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@358aaa85{/environment/json,null,UNAVAILABLE}16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6f0aea07{/environment,null,UNAVAILABLE}16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5163cc8f{/storage/rdd/json,null,UNAVAILABLE}16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7c8c17a2{/storage/rdd,null,UNAVAILABLE}16/10/12 17:53:23 INFO HiveMetaStore: 29: get_multi_table : db=default tbls=16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@14bed4e9{/storage/json,null,UNAVAILABLE}16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@753e481{/storage,null,UNAVAILABLE}16/10/12 17:53:23 INFO HiveMetaStore: 8: get_multi_table : db=default tbls=16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@738e5c64{/stages/pool/json,null,UNAVAILABLE}16/10/12 17:53:23 INFO HiveMetaStore: 27: get_multi_table : db=default tbls=16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7b29c61f{/stages/pool,null,UNAVAILABLE}16/10/12 17:53:23 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 1516/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@61ae508a{/stages/stage/json,null,UNAVAILABLE}16/10/12 17:53:23 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 1616/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@4371c2d5{/stages/stage,null,UNAVAILABLE}16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2a2a5dc3{/stages/json,null,UNAVAILABLE}16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2943ebbf{/stages,null,UNAVAILABLE}16/10/12 17:53:23 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1b7b0401{/jobs/job/json,null,UNAVAILABLE}16/10/12 17:53:23 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@44c076b{/jobs/job,null,UNAVAILABLE}16/10/12 17:53:23 INFO audit: ugi=hiveip=unknown-ip-addrcmd=get_multi_table : db=default tbls=16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@760dcdf2{/jobs/json,null,UNAVAILABLE}16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@725819eb{/jobs,null,UNAVAILABLE}16/10/12 17:53:23 INFO SparkUI: Stopped Spark web UI at http://10.120.193.4:404016/10/12 17:53:23 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:23 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:23 INFO ThriftCLIService: Session disconnected without closing properly, close it now16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003023,BlockManagerId(15, node18.it.leap.com, 62536),428762726)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003026,BlockManagerId(16, node18.it.leap.com, 41766),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003032,BlockManagerId(15, node18.it.leap.com, 62536),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003035,BlockManagerId(16, node18.it.leap.com, 41766),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003040,BlockManagerId(15, node18.it.leap.com, 62536),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003044,BlockManagerId(16, node18.it.leap.com, 41766),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003048,BlockManagerId(15, node18.it.leap.com, 62536),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003052,BlockManagerId(16, node18.it.leap.com, 41766),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003056,BlockManagerId(15, node18.it.leap.com, 62536),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003060,BlockManagerId(16, node18.it.leap.com, 41766),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003064,BlockManagerId(15, node18.it.leap.com, 62536),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003068,BlockManagerId(16, node18.it.leap.com, 41766),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003072,BlockManagerId(15, node18.it.leap.com, 62536),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003077,BlockManagerId(16, node18.it.leap.com, 41766),428762726)16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))16/10/12 17:53:23 ERROR SparkExecuteStatementOperation: Error executing query, currentState CLOSED, org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:Exchange SinglePartition+- *LocalLimit 1000   +- *HashAggregate(keys=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654], functions=[], output=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654])      +- Exchange hashpartitioning(PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654, 200)         +- *HashAggregate(keys=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654], functions=[], output=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654])            +- *Project [CASE WHEN (substring(ZMAT#982, 1, 10) = 0000000000) THEN substring(ZMAT#982, 11, 8) WHEN (substring(ZMAT#982, 1, 9) = 000000000) THEN substring(ZMAT#982, 10, 9) WHEN (substring(ZMAT#982, 1, 5) = 00000) THEN substring(ZMAT#982, 6, 7) ELSE ZMAT#982 END AS PART_NUMBER#649, ZPRODH_D#1050 AS PRODUCT_LOB#650, ZBRAND_D#1040 AS PRODUCT_BRAND#651, ZSERIES_D#1044 AS PRODUCT_FAMILY#652, ZSUBSER_D#1046 AS PRODUCT_SUB_FAMILY#653, ZMAT#982 AS PRODUCT_CD#654]               +- *BroadcastHashJoin [ZMAT#982], [MATERIAL#1063], Inner, BuildLeft                  :- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))                  :  +- *Project [zmat#982, zbrand_d#1040, zseries_d#1044, zsubser_d#1046, zprodh_d#1050]                  :     +- *Filter isnotnull(ZMAT#982)                  :        +- *BatchedScan parquet idl_bw.zoh_mds32_idl_p[zmat#982,zbrand_d#1040,zseries_d#1044,zsubser_d#1046,zprodh_d#1050] Format: ParquetFormat, InputPaths: hdfs://node4.it.leap.com:8020/apps/hive/warehouse/idl_bw.db/zoh_mds32_idl_p, PushedFilters: [IsNotNull(zmat)], ReadSchema: struct<zmat:string,zbrand_d:string,zseries_d:string,zsubser_d:string,zprodh_d:string>                  +- *Project [material#1063]                     +- *Filter (SALESORG#1064 IN (AR10,BR10,CA10,CO10,MX10,US10) && isnotnull(MATERIAL#1063))                        +- *BatchedScan parquet idl_bw.zoh_mms05_idl_p[material#1063,salesorg#1064] Format: ParquetFormat, InputPaths: hdfs://node4.it.leap.com:8020/apps/hive/warehouse/idl_bw.db/zoh_mms05_idl_p, PushedFilters: [In(salesorg, [AR10,BR10,CA10,CO10,MX10,US10], IsNotNull(material)], ReadSchema: struct<material:string,salesorg:string>at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)at org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:113)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:233)at org.apache.spark.sql.execution.BaseLimitExec$class.inputRDDs(limit.scala:63)at org.apache.spark.sql.execution.GlobalLimitExec.inputRDDs(limit.scala:103)at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:36)at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:361)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:237)at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:142)at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:313)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)at org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:94)at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186)at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:682)at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:213)at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:157)at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:167)at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)at java.util.concurrent.FutureTask.run(FutureTask.java:262)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:Exchange hashpartitioning(PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654, 200)+- *HashAggregate(keys=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654], functions=[], output=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654])   +- *Project [CASE WHEN (substring(ZMAT#982, 1, 10) = 0000000000) THEN substring(ZMAT#982, 11, 8) WHEN (substring(ZMAT#982, 1, 9) = 000000000) THEN substring(ZMAT#982, 10, 9) WHEN (substring(ZMAT#982, 1, 5) = 00000) THEN substring(ZMAT#982, 6, 7) ELSE ZMAT#982 END AS PART_NUMBER#649, ZPRODH_D#1050 AS PRODUCT_LOB#650, ZBRAND_D#1040 AS PRODUCT_BRAND#651, ZSERIES_D#1044 AS PRODUCT_FAMILY#652, ZSUBSER_D#1046 AS PRODUCT_SUB_FAMILY#653, ZMAT#982 AS PRODUCT_CD#654]      +- *BroadcastHashJoin [ZMAT#982], [MATERIAL#1063], Inner, BuildLeft         :- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))         :  +- *Project [zmat#982, zbrand_d#1040, zseries_d#1044, zsubser_d#1046, zprodh_d#1050]         :     +- *Filter isnotnull(ZMAT#982)         :        +- *BatchedScan parquet idl_bw.zoh_mds32_idl_p[zmat#982,zbrand_d#1040,zseries_d#1044,zsubser_d#1046,zprodh_d#1050] Format: ParquetFormat, InputPaths: hdfs://node4.it.leap.com:8020/apps/hive/warehouse/idl_bw.db/zoh_mds32_idl_p, PushedFilters: [IsNotNull(zmat)], ReadSchema: struct<zmat:string,zbrand_d:string,zseries_d:string,zsubser_d:string,zprodh_d:string>         +- *Project [material#1063]            +- *Filter (SALESORG#1064 IN (AR10,BR10,CA10,CO10,MX10,US10) && isnotnull(MATERIAL#1063))               +- *BatchedScan parquet idl_bw.zoh_mms05_idl_p[material#1063,salesorg#1064] Format: ParquetFormat, InputPaths: hdfs://node4.it.leap.com:8020/apps/hive/warehouse/idl_bw.db/zoh_mms05_idl_p, PushedFilters: [In(salesorg, [AR10,BR10,CA10,CO10,MX10,US10], IsNotNull(material)], ReadSchema: struct<material:string,salesorg:string>at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)at org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:113)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:233)at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:138)at org.apache.spark.sql.execution.BaseLimitExec$class.inputRDDs(limit.scala:63)at org.apache.spark.sql.execution.LocalLimitExec.inputRDDs(limit.scala:96)at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:361)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)at org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:86)at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:122)at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:113)at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)... 58 moreCaused by: java.lang.InterruptedExceptionat java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1038)at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)at scala.concurrent.Await$.result(package.scala:190)at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190)at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120)at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30)at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62)at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)at org.apache.spark.sql.execution.FilterExec.consume(basicPhysicalOperators.scala:79)at org.apache.spark.sql.execution.FilterExec.doConsume(basicPhysicalOperators.scala:194)at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)at org.apache.spark.sql.execution.BatchedDataSourceScanExec.consume(ExistingRDD.scala:225)at org.apache.spark.sql.execution.BatchedDataSourceScanExec.doProduce(ExistingRDD.scala:328)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.BatchedDataSourceScanExec.produce(ExistingRDD.scala:225)at org.apache.spark.sql.execution.FilterExec.doProduce(basicPhysicalOperators.scala:113)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.FilterExec.produce(basicPhysicalOperators.scala:79)at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:30)at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce(BroadcastHashJoinExec.scala:77)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.produce(BroadcastHashJoinExec.scala:38)at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:30)at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:526)at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:145)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)at org.apache.spark.sql.execution.aggregate.HashAggregateExec.produce(HashAggregateExec.scala:37)at org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:309)at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:347)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)at org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:86)at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:122)at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:113)at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)... 80 more16/10/12 17:53:23 ERROR SparkExecuteStatementOperation: Error running hive query: org.apache.hive.service.cli.HiveSQLException: Illegal Operation state transition from CLOSED to ERRORat org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:92)at org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:98)at org.apache.hive.service.cli.operation.Operation.setState(Operation.java:126)at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:245)at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:157)at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:167)at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)at java.util.concurrent.FutureTask.run(FutureTask.java:262)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)16/10/12 17:53:23 INFO AbstractService: Service:CLIService is stopped.16/10/12 17:53:23 INFO AbstractService: Service:HiveServer2 is stopped.

             

             首先去查看spark-env.sh 文件修改下列参数:

             SPARK_EXECUTOR_CORES="4"

             SPARK_EXECUTOR_MEMORY="4G" 

     SPARK_DRIVER_MEMORY="20G"

             执行测试sql,发现还是一样的异常.

             查看spark-thrift-sparkconf.conf 文件发现executor是动态的.如下配置

            spark.dynamicAllocation.enabled true
            spark.dynamicAllocation.initialExecutors 0
            spark.dynamicAllocation.maxExecutors 200
            spark.dynamicAllocation.minExecutors 0

            将动态的关闭,使用静态的配置测试, 以下配置是静态的

            首先将上述动态配置删除,加上下列静态配置

            spark.executor.memory 10G
            spark.executor.instances 20
            spark.executor.cores 2
            spark.shuffle.service.enabled false

        然后重新启动spark-thrift-server ,  /sbin/start-thriftserver.sh  --properties-file  ../conf/spark-thrift-sparkconf.conf 

        使用beeline -u    "数据库连接串"            连接thriftserver ,测试sql ,  结果还是失败,同样的异常

    

        最后通过跟踪表数据操作流向,发现此sql语句在做join的时候,有很多笛卡尔积运算,并且在运算时会把小表broadcast到每台worker上,造成OOM GC异常。

         解决方法:  在spark-thrift-sparkconf.conf 文件中增加下述配置:

         spark.sql.autoBroadcastJoinThreshold =-1

        重启服务,测试通过.

      

           

   

0 0
原创粉丝点击