Hive+UDAF简单示例

来源：互联网发布：大数据建模工程师编辑：程序博客网时间：2024/05/01 15:50

转载自 http://blog.csdn.net/wisgood/article/details/26167367

在之前的一篇博文中,演示了一个使用通用UDTF来计算总分的小示例,下面用UDAF来做这个工作。

1.编写UDAF。

[java] view plaincopy
package com.wz.udf;  
  
import org.apache.hadoop.hive.ql.exec.UDAF;  
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;  
import org.apache.hadoop.io.Text;  
import java.util.HashMap;  
import java.util.Map;  
public class helloUDAF extends UDAF {  
    public static class Evaluator implements UDAFEvaluator  
    {  
       //存放不同学生的总分  
       private static Map<String,Integer> ret;  
  
       public Evaluator()  
       {  
       super();  
           init();  
       }  
  
       //初始化  
       public void init()  
       {  
      ret = new HashMap<String,Integer>();  
       }  
  
       //map阶段，遍历所有记录  
       public boolean iterate(String strStudent,int nScore)  
       {   
         if(ret.containsKey(strStudent))  
         {  
            int nValue = ret.get(strStudent);  
            nValue +=nScore;  
            ret.put(strStudent,nValue);  
         }  
         else  
         {  
           ret.put(strStudent,nScore);  
         }  
         return true;  
       }  
      
       //返回最终结果   
       public Map<String,Integer> terminate()  
       {  
         return ret;  
       }  
  
       //combiner阶段，本例不需要  
       public Map<String,Integer> terminatePartial()   
       {  
          return ret;  
       }  
  
       //reduce阶段  
       public boolean merge(Map<String,Integer> other)  
       {  
            for (Map.Entry<String, Integer> e : other.entrySet()) {  
                ret.put(e.getKey(),e.getValue());  
            }  
            return true;  
       }  
    }     
}  

2.编译并打包成jar包。

javac -classpath /home/wangzhun/hadoop/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/wangzhun/hive/hive-0.8.1/lib/hive-exec-0.8.1.jar helloUDAF.java

jar cvf helloUDAF.jar com/wz/udf/helloUDAF*.class

3.在hive下面调用,创建临时函数,并执行查询得到结果。

[plain] view plaincopy
hive> add jar /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar;                  
Added /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar to class path  
Added resource: /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar  
hive> create temporary function helloudaf as 'com.wz.udf.helloUDAF';             
OK  
Time taken: 0.02 seconds  
hive> select helloudaf(studentScore.name,studentScore.score) from studentScore;  
Total MapReduce jobs = 1  
Launching Job 1 out of 1  
Number of reduce tasks determined at compile time: 1  
In order to change the average load for a reducer (in bytes):  
  set hive.exec.reducers.bytes.per.reducer=<number>  
In order to limit the maximum number of reducers:  
  set hive.exec.reducers.max=<number>  
In order to set a constant number of reducers:  
  set mapred.reduce.tasks=<number>  
Starting Job = job_201311282251_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201311282251_0009  
Kill Command = /home/wangzhun/hadoop/hadoop-0.20.2/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201311282251_0009  
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1  
2013-11-29 00:34:01,290 Stage-1 map = 0%,  reduce = 0%  
2013-11-29 00:34:04,316 Stage-1 map = 100%,  reduce = 0%  
2013-11-29 00:34:13,403 Stage-1 map = 100%,  reduce = 100%  
Ended Job = job_201311282251_0009  
MapReduce Jobs Launched:   
Job 0: Map: 1  Reduce: 1   HDFS Read: 40 HDFS Write: 12 SUCESS  
Total MapReduce CPU Time Spent: 0 msec  
OK  
{"A":290,"B":325}  
Time taken: 32.275 seconds  

0 0