Hive Multi Insert 引起的GC overhead limit exceeded

来源：互联网发布：淘宝宝贝描述尺寸编辑：程序博客网时间：2024/06/15 09:40

转载于该大兄弟:
http://ju.outofmemory.cn/entry/224490

当你有这么个需求从某张hive表里做各类统计，完了之后落到各个分类的统计表里存储。自然而然我们会想到使用hive的Multi Insert 语句来实现。因为使用Multi Insert 语句可以避免多次扫描同一份原始表数据。本文记录一次使用Multi Insert 语句出现的GC overhead limit exceeded问题。

问题描述

我有这么个需求从某个域名相关的表里统计各个维度的数据落到相应的表里面。下面是我的SQL实例代码：

FROM qbox_bi_gold.domain_info INPUT             INSERT OVERWRITE TABLE 5min PARTITION (day="20151130")                  SELECT cast(time/300000 as bigint)*300000 AS time  , SUM(flow) AS flow,SUM(hits) AS hits                  WHERE day="20151130"                  GROUP BY cast(time/300000 as bigint)*300000             INSERT OVERWRITE TABLE prov_5min PARTITION (day="20151130")                   SELECT cast(time/300000 as bigint)*300000 AS time ,prov, SUM(flow) AS flow,SUM(hits) AS hits                   WHERE day="20151130"                   GROUP BY cast(time/300000 as bigint)*300000 ,prov             INSERT OVERWRITE TABLE prov_uid_5min PARTITION (day="20151130")                    SELECT cast(time/300000 as bigint)*300000 AS time ,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits                    WHERE day="20151130"                    GROUP BY cast(time/300000 as bigint)*300000 ,prov,uid              INSERT OVERWRITE TABLE bucket_prov_uid_5min PARTITION (day="20151130")                    SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits                    WHERE day="20151130"                    GROUP BY cast(time/300000 as bigint)*300000 ,bucket,prov,uid              INSERT OVERWRITE TABLE bucket_domain_prov_uid_5min PARTITION (day="20151130")                    SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,domain,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits                    WHERE day="20151130"                    GROUP BY cast(time/300000 as bigint)*300000 ,bucket,domain,prov,uid              INSERT OVERWRITE TABLE bucket_city_domain_prov_uid_5min PARTITION (day="20151130")                    SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,city,domain,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits                    WHERE day="20151130"                    GROUP BY cast(time/300000 as bigint)*300000 ,bucket,city,domain,prov,uid

上述语句会产生6个Job，你可以使用explain hsql来查看执行解析流程：

STAGE DEPENDENCIES:  Stage-6 is a root stage  Stage-0 depends on stages: Stage-6  Stage-7 depends on stages: Stage-0  Stage-8 depends on stages: Stage-6  Stage-1 depends on stages: Stage-8  Stage-9 depends on stages: Stage-1  Stage-10 depends on stages: Stage-6  Stage-2 depends on stages: Stage-10  Stage-11 depends on stages: Stage-2  Stage-12 depends on stages: Stage-6  Stage-3 depends on stages: Stage-12  Stage-13 depends on stages: Stage-3  Stage-14 depends on stages: Stage-6  Stage-4 depends on stages: Stage-14  Stage-15 depends on stages: Stage-4  Stage-16 depends on stages: Stage-6  Stage-5 depends on stages: Stage-16  Stage-17 depends on stages: Stage-5STAGE PLANS:  Stage: Stage-6    Map Reduce      Map Operator Tree:          TableScan            alias: input            Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE            Select Operator              expressions: time (type: bigint), flow (type: bigint), hits (type: bigint)              outputColumnNames: time, flow, hits              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE              Group By Operator                aggregations: sum(flow), sum(hits)                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint)                mode: hash                outputColumnNames: _col0, _col1, _col2                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE                Reduce Output Operator                  key expressions: _col0 (type: bigint)                  sort order: +                  Map-reduce partition columns: _col0 (type: bigint)                  Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE                  value expressions: _col1 (type: bigint), _col2 (type: bigint)            Select Operator              expressions: time (type: bigint), prov (type: string), flow (type: bigint), hits (type: bigint)              outputColumnNames: time, prov, flow, hits              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE              Group By Operator                aggregations: sum(flow), sum(hits)                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint), prov (type: string)                mode: hash                outputColumnNames: _col0, _col1, _col2, _col3                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE                File Output Operator                  compressed: false                  table:                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe            Select Operator              expressions: time (type: bigint), prov (type: string), uid (type: int), flow (type: bigint), hits (type: bigint)              outputColumnNames: time, prov, uid, flow, hits              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE              Group By Operator                aggregations: sum(flow), sum(hits)                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint), prov (type: string), uid (type: int)                mode: hash                outputColumnNames: _col0, _col1, _col2, _col3, _col4                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE                File Output Operator                  compressed: false                  table:                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe            Select Operator              expressions: time (type: bigint), bucket (type: string), prov (type: string), uid (type: int), flow (type: bigint), hits (type: bigint)              outputColumnNames: time, bucket, prov, uid, flow, hits              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE              Group By Operator                aggregations: sum(flow), sum(hits)                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint), bucket (type: string), prov (type: string), uid (type: int)                mode: hash                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE                File Output Operator                  compressed: false                  table:                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe            Select Operator              expressions: time (type: bigint), bucket (type: string), domain (type: string), prov (type: string), uid (type: int), flow (type: bigint), hits (type: bigint)              outputColumnNames: time, bucket, domain, prov, uid, flow, hits              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE              Group By Operator                aggregations: sum(flow), sum(hits)                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint), bucket (type: string), domain (type: string), prov (type: string), uid (type: int)                mode: hash                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE                File Output Operator                  compressed: false                  table:                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe            Select Operator              expressions: time (type: bigint), bucket (type: string), city (type: string), domain (type: string), prov (type: string), uid (type: int), flow (type: bigint), hits (type: bigint)              outputColumnNames: time, bucket, city, domain, prov, uid, flow, hits              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE              Group By Operator                aggregations: sum(flow), sum(hits)                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint), bucket (type: string), city (type: string), domain (type: string), prov (type: string), uid (type: int)                mode: hash                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE                File Output Operator                  compressed: false                  table:                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe      Reduce Operator Tree:        Group By Operator          aggregations: sum(VALUE._col0), sum(VALUE._col1)          keys: KEY._col0 (type: bigint)          mode: mergepartial          outputColumnNames: _col0, _col1, _col2          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE          Select Operator            expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint)            outputColumnNames: _col0, _col1, _col2            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE            File Output Operator              compressed: false              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE              table:                  input format: org.apache.hadoop.mapred.TextInputFormat                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  name: domain_areav1.5min  Stage: Stage-0    Move Operator      tables:          partition:            day 20151130          replace: true          table:              input format: org.apache.hadoop.mapred.TextInputFormat              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe              name: domain_areav1.5min  Stage: Stage-7    Stats-Aggr Operator  Stage: Stage-8    Map Reduce      Map Operator Tree:          TableScan            Reduce Output Operator              key expressions: _col0 (type: bigint), _col1 (type: string)              sort order: ++              Map-reduce partition columns: _col0 (type: bigint), _col1 (type: string)              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE              value expressions: _col2 (type: bigint), _col3 (type: bigint)      Reduce Operator Tree:        Group By Operator          aggregations: sum(VALUE._col0), sum(VALUE._col1)          keys: KEY._col0 (type: bigint), KEY._col1 (type: string)          mode: mergepartial          outputColumnNames: _col0, _col1, _col2, _col3          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE          Select Operator            expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: bigint), _col3 (type: bigint)            outputColumnNames: _col0, _col1, _col2, _col3            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE            File Output Operator              compressed: false              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE              table:                  input format: org.apache.hadoop.mapred.TextInputFormat                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  name: domain_areav1.prov_5min  Stage: Stage-1    Move Operator      tables:          partition:            day 20151130          replace: true          table:              input format: org.apache.hadoop.mapred.TextInputFormat              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe              name: domain_areav1.prov_5min  Stage: Stage-9    Stats-Aggr Operator  Stage: Stage-10    Map Reduce      Map Operator Tree:          TableScan            Reduce Output Operator              key expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: int)              sort order: +++              Map-reduce partition columns: _col0 (type: bigint), _col1 (type: string), _col2 (type: int)              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE              value expressions: _col3 (type: bigint), _col4 (type: bigint)      Reduce Operator Tree:        Group By Operator          aggregations: sum(VALUE._col0), sum(VALUE._col1)          keys: KEY._col0 (type: bigint), KEY._col1 (type: string), KEY._col2 (type: int)          mode: mergepartial          outputColumnNames: _col0, _col1, _col2, _col3, _col4          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE          Select Operator            expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: int), _col3 (type: bigint), _col4 (type: bigint)            outputColumnNames: _col0, _col1, _col2, _col3, _col4            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE            File Output Operator              compressed: false              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE              table:                  input format: org.apache.hadoop.mapred.TextInputFormat                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  name: domain_areav1.prov_uid_5min  Stage: Stage-2    Move Operator      tables:          partition:            day 20151130          replace: true          table:              input format: org.apache.hadoop.mapred.TextInputFormat              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe              name: domain_areav1.prov_uid_5min  Stage: Stage-11    Stats-Aggr Operator  Stage: Stage-12    Map Reduce      Map Operator Tree:          TableScan            Reduce Output Operator              key expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: int)              sort order: ++++              Map-reduce partition columns: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: int)              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE              value expressions: _col4 (type: bigint), _col5 (type: bigint)      Reduce Operator Tree:        Group By Operator          aggregations: sum(VALUE._col0), sum(VALUE._col1)          keys: KEY._col0 (type: bigint), KEY._col1 (type: string), KEY._col2 (type: string), KEY._col3 (type: int)          mode: mergepartial          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE          Select Operator            expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: int), _col4 (type: bigint), _col5 (type: bigint)            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE            File Output Operator              compressed: false              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE              table:                  input format: org.apache.hadoop.mapred.TextInputFormat                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  name: domain_areav1.bucket_prov_uid_5min  Stage: Stage-3    Move Operator      tables:          partition:            day 20151130          replace: true          table:              input format: org.apache.hadoop.mapred.TextInputFormat              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe              name: domain_areav1.bucket_prov_uid_5min  Stage: Stage-13    Stats-Aggr Operator  Stage: Stage-14    Map Reduce      Map Operator Tree:          TableScan            Reduce Output Operator              key expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int)              sort order: +++++              Map-reduce partition columns: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int)              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE              value expressions: _col5 (type: bigint), _col6 (type: bigint)      Reduce Operator Tree:        Group By Operator          aggregations: sum(VALUE._col0), sum(VALUE._col1)          keys: KEY._col0 (type: bigint), KEY._col1 (type: string), KEY._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: int)          mode: mergepartial          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE          Select Operator            expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: bigint), _col6 (type: bigint)            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE            File Output Operator              compressed: false              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE              table:                  input format: org.apache.hadoop.mapred.TextInputFormat                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  name: domain_areav1.bucket_domain_prov_uid_5min  Stage: Stage-4    Move Operator      tables:          partition:            day 20151130          replace: true          table:              input format: org.apache.hadoop.mapred.TextInputFormat              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe              name: domain_areav1.bucket_domain_prov_uid_5min  Stage: Stage-15    Stats-Aggr Operator  Stage: Stage-16    Map Reduce      Map Operator Tree:          TableScan            Reduce Output Operator              key expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int)              sort order: ++++++              Map-reduce partition columns: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int)              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE              value expressions: _col6 (type: bigint), _col7 (type: bigint)      Reduce Operator Tree:        Group By Operator          aggregations: sum(VALUE._col0), sum(VALUE._col1)          keys: KEY._col0 (type: bigint), KEY._col1 (type: string), KEY._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: string), KEY._col5 (type: int)          mode: mergepartial          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE          Select Operator            expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int), _col6 (type: bigint), _col7 (type: bigint)            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE            File Output Operator              compressed: false              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE              table:                  input format: org.apache.hadoop.mapred.TextInputFormat                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  name: domain_areav1.bucket_city_domain_prov_uid_5min  Stage: Stage-5    Move Operator      tables:          partition:            day 20151130          replace: true          table:              input format: org.apache.hadoop.mapred.TextInputFormat              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe              name: domain_areav1.bucket_city_domain_prov_uid_5min  Stage: Stage-17    Stats-Aggr Operator

从上面可以看到Stage-6 is a root stage。Stage-6是第一个需要完成的job，然而问题就出现在这里。GC overhead limit exceeded ！！！
从失败的jobhistory里可以看到失败是发生在map阶段。

...map = 99%,  reduce = 33%, Cumulative CPU 9676.12 secmap = 100%,  reduce = 100%, Cumulative CPU 9686.12 sec

也就是发现在map阶段。我们先看看错误堆栈：

-12-01 18:21:02,424 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.lang.OutOfMemoryError: GC overhead limit exceeded    at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)    at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)    at sun.nio.cs.StreamDecoder.(StreamDecoder.java:250)    at sun.nio.cs.StreamDecoder.(StreamDecoder.java:230)    at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:69)    at java.io.InputStreamReader.(InputStreamReader.java:74)    at java.io.FileReader.(FileReader.java:72)    at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:381)    at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.updateProcessTree(ProcfsBasedProcessTree.java:162)    at org.apache.hadoop.mapred.Task.updateResourceCounters(Task.java:839)    at org.apache.hadoop.mapred.Task.updateCounters(Task.java:978)    at org.apache.hadoop.mapred.Task.access$500(Task.java:77)    at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:727)    at java.lang.Thread.run(Thread.java:745)

map阶段OutOfMemoryError: GC overhead limit exceeded。

问题分析

OMM的通用原因大家都知道。加内存嘛！呵呵，咱不是土豪，而且OMM的原因加内存不一定能解决，还是找找内因。那么解决OMM该怎么做呢？首先我们得清楚OMM的原因的可能。1. 内存确实不够程序使用。2. 程序存在内存泄露或者程序的不够高效。作为立志成为资深程序猿的人应该从第二个入手。好，我们先分析分析：

hive程序运行环境：

系统46台 Ubuntu12.04， 8核心，32G Mem。Hadoop版本2.2.0 ，hive 0.12。数据100G+ Text。使用的队列最大大概是总体的40%。上述hive程序启动map数380左右，reduce数120左右。按理说这样的数量应该不算大。但问题是它确实OMM了。应为使用的时hive程序，不是自己写的。应该不大可能存在内存泄露的代码。那么应该是hive sql不合理，首先想到的是Multi Insert的效率问题。测试：分别跑单个Insert语句，即删掉一些Insert语句。
实例代码如下：

FROM qbox_bi_gold.domain_info INPUT                      INSERT OVERWRITE TABLE 5min PARTITION (day="20151130")                            SELECT cast(time/300000 as bigint)*300000 AS time  , SUM(flow) AS flow,SUM(hits) AS hits                            WHERE day="20151130"                            GROUP BY cast(time/300000 as bigint)*300000FROM qbox_bi_gold.domain_info INPUT           INSERT OVERWRITE TABLE prov_5min PARTITION (day="20151130")                 SELECT cast(time/300000 as bigint)*300000 AS time ,prov, SUM(flow) AS flow,SUM(hits) AS hits                 WHERE day="20151130"                 GROUP BY cast(time/300000 as bigint)*300000 ,provFROM qbox_bi_gold.domain_info INPUT         INSERT OVERWRITE TABLE prov_uid_5min PARTITION (day="20151130")                            SELECT cast(time/300000 as bigint)*300000 AS time ,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits                            WHERE day="20151130"                            GROUP BY cast(time/300000 as bigint)*300000 ,prov,uidFROM qbox_bi_gold.domain_info INPUT          INSERT OVERWRITE TABLE bucket_prov_uid_5min PARTITION (day="20151130")                            SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits                            WHERE day="20151130"                            GROUP BY cast(time/300000 as bigint)*300000 ,bucket,prov,uidFROM qbox_bi_gold.domain_info INPUT          INSERT OVERWRITE TABLE bucket_domain_prov_uid_5min PARTITION (day="20151130")                            SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,domain,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits                            WHERE day="20151130"                            GROUP BY cast(time/300000 as bigint)*300000 ,bucket,domain,prov,uidFROM qbox_bi_gold.domain_info INPUT          INSERT OVERWRITE TABLE bucket_city_domain_prov_uid_5min PARTITION (day="20151130")                            SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,city,domain,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits                            WHERE day="20151130"                            GROUP BY cast(time/300000 as bigint)*300000 ,bucket,city,domain,prov,uid

结果都是能够跑出来的。也就是说Multi Insert是比较耗费内存导致OMM，并不是sql程序的问题。那么最大的原因是我们给程序（mapreduce）的内存过小。那么先看下我们到底配置了多大的内存。
在hive cli里执行下面命令：

hive> set mapreduce.map.java.opts;mapreduce.map.java.opts=-Xmx1500mhive> set mapreduce.reduce.java.opts;mapreduce.reduce.java.opts=-Xmx2048mhive> set mapreduce.map.memory.mb;mapreduce.map.memory.mb=2048hive> set mapreduce.reduce.memory.mb;mapreduce.reduce.memory.mb=3072

我们的程序问题出现在map阶段OMM，所以应该是map的内存设置小了（mapreduce.map.java.opts=1.5g）。也是设置大点，但是不能操作map允许的最大值 mapreduce.map.memory.mb（这里为2g）。

总结：

对于内存问题导致的OMM我们需要从两点入手：

程序是否有内存泄露
内存是否确实设置过小

对于第一个首先排查程序问题。在上面案例中我们使用了Multi Insert导致内存不够Gc。这里你就会问了什么是GC overhead limit exceeded 而不是Java heap space？

GC overhead limit exceeded的解释： 一、异常描述： Exception in thread “main” java.lang.OutOfMemoryError: GC overhead limit exceeded二、解释： JDK6新增错误类型。当GC为释放很小空间占用大量时间时抛出。 一般是因为堆太小。 导致异常的原因：没有足够的内存。三、解决方案： 1、查看系统是否有使用大内存的代码或死循环。 2、可以添加JVM的启动参数来限制使用内存：-XX:-UseGCOverheadLimit

所以对于本案例来说我的优化如下：

set mapreduce.map.java.opts=-Xmx1800m -XX:-UseGCOverheadLimit

阅读全文

0 0