Hive Runtime Error while processing row

来源：互联网发布：手机日历软件编辑：程序博客网时间：2024/05/21 18:40

最近执行Hive任务时遇到如下错误：

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {“key”:{“reducesinkkey0”:”00.26.37.E3.07.D3”,”reducesinkkey1”:”2014-07-07 12:51:46”},”value”:{“_col2”:515,”_col3”:”515999000056662_00.26.37.E3.07.D3”,”_col5”:”00.26.37.E3.07.D3”,”_col6”:”2014-07-07 12:51:46”},”alias”:0}
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:274)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:469)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {“key”:{“reducesinkkey0”:”00.26.3

经排查，在Hive中单独执行该语句时就会报错：

select colx,coly,col_par
from
(
select colx,coly,col_par,
row_number() over(partition by col_par order by create_time) rn
from test.table_name
) t
where rn =1;

总表的数据量很小，才几百万。
怀疑是数据的问题，于是

select * from
(
select col_par,count(1) rn
from test.table_name group by col_par
) t
where rn>1000;

发现col_par 为”空串，相同的记录居然有100多万,明显会造成严重的数据倾斜！！
通过执行的过程可以发现，错误是在reduce 阶段失败的，反复尝试3次均以失败告终。
而这个语句生成的reduce数只有1个。

解决办法：

1.当然由于业务的关系，正好为空串的数据是可以排除的，于是
select colx,coly,col_par
from
(
select colx,coly,col_par,
row_number() over(partition by col_par order by create_time) rn
from test.table_name
where col_par<>”
) t
where rn =1;
这样既可以解决数据倾斜的问题，又可以保证reduce 洗牌阶段不会失败。

2.设置参数，增加reduce数

转自：http://blog.csdn.net/lxpbs8851/article/details/39025501

阅读全文

0 0