开发Hive自定义函数

来源：互联网发布：js提示框代码编辑：程序博客网时间：2024/06/03 07:40

推荐文章《hive2.1.1 + hadoop2.8.0 + windows7（不用cygwin）搭建Hive》

1、开发自定义函数

自定义类继承hive提供的UDF类

UDF ：（user define function）用户自定义函数

定义evaluate方法，可以重载多个，hive会自动根据参数类型决定调用那个方法

package com.cn.test.hive;import org.apache.commons.lang3.StringUtils;import org.apache.hadoop.hive.ql.exec.UDF;import java.util.HashMap;import java.util.Map;public class MyTestUDF extends UDF {    private static final Map<String, String> areaMap = new HashMap<String, String>();    static {        areaMap.put("135", "福建");        areaMap.put("136", "浙江");        areaMap.put("137", "广东");        areaMap.put("138", "北江");        areaMap.put("139", "上海");    }    /**     * 根据手机号获取归属地     */    public String evaluate(String phoneNum) {        if(StringUtils.isBlank(phoneNum) || phoneNum.length() < 3){            return "";        }        String area = areaMap.get(phoneNum.substring(0, 3));        return area == null? "未知":area;    }    /**     * 统计上行和下行流量     */    public int evaluate(int upFlow, int downFlow) {        return upFlow + downFlow;    }}

2、打包并上传

将自定义函数的jar包拷贝到HIVE_HOME\lib文件夹下（windows下推荐使用此方法，需重启hive客户端）

或使用hive客户端执行 add jar '/home/myUDF.jar'（linux下推荐使用此方法）

3、创建hive函数

create temporary function areasum as 'com.cn.test.hive.MyTestUDF';

4、建表

create table t_user_udf(phone string, up_flow int, down_flow int)

row format delimited

fields terminated by ',';

5、上传数据

在本地文件夹创建data.txt，内容：

13350467821,100,233013450467821,120,200213550467821,100,21013650467821,130,20012313750467821,122,20013850467821,100,210213

在hive客户端执行：

load data local inpath 'e:/data.txt' into table t_user_udf;

6、使用自定义函数

select phone, areasum(phone), areasum(up_flow, down_flow) from t_user_udf;

阅读全文

0 0