hive 使用心得

来源：互联网发布：美国克拉克森大学知乎编辑：程序博客网时间：2024/06/08 10:40

1、遇到内存不够怎么办

一般情况下是reduce端内存溢出，设置以下参数：

         set mapreduce.reduce.java.opts=-Xmx10000M;         set mapreduce.reduce.memory.mb=10000;         set mapreduce.reduce.tasks=64;

2、遇到异常数据，比如说null，导致npe，程序退出

使用 coalesce 函数可以设置当值为null或者None等异常的时候的默认值

coalesce（col,"invalid"）

3、mapjoin，在遇到两个表jion其中一个表的数据量很少，可以使用mapjoin优化

set hive.auto.convert.join = true;select /*+MAPJOIN(mobile_location)*/  a.*,b.province as mob_province,b.city as mob_city from userinfo a left outer join  mobile_location b  on (b.mob_prefix=substr(a.reg_mobile,0,7)) ;

4、udf 使用

编程比较简单，添加jar和file有点麻烦

add file /export/tmp/qingwu.fu/userinfo/NIandLocal.txt;add jar /export/tmp/qingwu.fu/userinfo/hive-udf-1.0-SNAPSHOT.jar;create temporary function jg as 'JiGuanUDF';

如果报file not find ，再创建一下临时function就可以

0 0