Hive的UDAF编程:计算几何平均值

来源:互联网 发布:数据库漏洞扫描系统 编辑:程序博客网 时间:2024/06/01 08:51

1eclipse上创建Map/Reduce工程,命名为GeoMeanPro,在创建前,先把hive/lib目录下的jar包复制到hadoop/lib目录下面;

2)在创建的工程上添加class,新建包com.hive.geomean.udaf,并在包下建立GeoMean.java;

3GeoMean.java代码为:

package com.hive.geomean.udaf;

import org.apache.hadoop.hive.ql.exec.UDAF;

import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;

import org.apache.hadoop.io.IntWritable;

public class GeoMean extends UDAF {  

  

    public static class GeoMeanUDAFEval implements UDAFEvaluator {  

        public static class PartialResult {  

            double sum;  

            long count;  

        }  

        private PartialResult pResult;   

        @Override  

        public void init() { 

            pResult = null;  

        }  

        //参数的入口函数

        public boolean iterate(IntWritable  value) {  

            if (value == null) {  

                return true;  

            }  

            if (pResult == null) {  

                pResult = new PartialResult(); 

                pResult.sum = 1;

                pResult.count = 0;

            }   

            pResult.sum *= value.get();  

            pResult.count++;  

            return true;  

        }  

        public PartialResult terminatePartial() {  

            return pResult;  

        }  

        public boolean merge(PartialResult other) {  

            if (other == null) {  

                return true;  

            }  

            if (pResult == null) {  

                pResult = new PartialResult(); 

                pResult.sum = 1;

                pResult.count = 0;

            }  

            pResult.sum *= other.sum;  

            pResult.count +=other.count;  

            return true;  

        }  

        public Double  terminate() {  

            if (pResult == null) {  

                return null;  

            }  

            return new Double (Math.pow(pResult.sum, 1.0/pResult.count)); 

        }  

    }  

}

(4)将工程exportjar包,并命名为geomean.jar,然后上传到/home/hadoop/class目录下:

(5)HiveUDAF使用方法如下:

hive> add jar /home/hadoop/class/geomean.jar;                              

Added /home/hadoop/class/geomean.jar to class path

Added resource: /home/hadoop/class/geomean.jar

hive> create temporary function geomean as 'com.hive.geomean.udaf.GeoMean';      

OK

Time taken: 0.038 seconds

 

hive> select * from grade;

OK

1       90

2       80

3       70

Time taken: 0.112 seconds

 

hive> select geomean (grade) from grade;

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapred.reduce.tasks=<number>

Starting Job = job_201503221120_0057, Tracking URL = http://Masterpc.hadoop:50030/jobdetails.jsp?jobid=job_201503221120_0057

Kill Command = /usr/hadoop/libexec/../bin/hadoop job  -kill job_201503221120_0057

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2015-03-23 22:36:57,988 Stage-1 map = 0%,  reduce = 0%

2015-03-23 22:37:04,042 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.39 sec

2015-03-23 22:37:05,063 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.39 sec

。。。。。。 

2015-03-23 22:37:22,264 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.87 sec

MapReduce Total cumulative CPU time: 3 seconds 870 msec

Ended Job = job_201503221120_0057

MapReduce Jobs Launched: 

Job 0: Map: 1  Reduce: 1   Cumulative CPU: 3.87 sec   HDFS Read: 228 HDFS Write: 18 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 870 msec

OK

79.58114415792782

Time taken: 44.677 seconds

hive> 

0 0
原创粉丝点击