hive系列(2)---自定义函数UDF

来源:互联网 发布:dd linux sync 编辑:程序博客网 时间:2024/05/22 00:10

一、自定义UDF

在hql中可以满足大部分的需求,但是可能还有部分特定的业务需要自己自定义函数来实现,下面来模拟实现:
1、表的建立:
t_p

0: jdbc:hive2://mini01:10000> desc t_p;+-----------+------------+----------+--+| col_name  | data_type  | comment  |+-----------+------------+----------+--+| id        | string     |          || name      | string     |          |+-----------+------------+----------+--+

t_flow

+-----------+------------+----------+--+| col_name  | data_type  | comment  |+-----------+------------+----------+--+| phone     | int        |          || flow      | int        |          |+-----------+------------+----------+--+

hql表的建立就省略了,看看数据:

0: jdbc:hive2://mini01:10000> select * from t_p;+---------+-----------+--+| t_p.id  | t_p.name  |+---------+-----------+--+| 14      | TIANJUN   || 1       | ni        || 2       | hao       || 3       | wo        || 4       | shi       || 5       | zhong     || 6       | guo       || 7       | ren       || 8       | ha        || 9       | wa        || 10      | ni        || 12      | wo        || 13      | ta        |+---------+-----------+--+
0: jdbc:hive2://mini01:10000> select * from t_flow;+---------------+--------------+--+| t_flow.phone  | t_flow.flow  |+---------------+--------------+--+| 136214        | 10           || 137123        | 14           || 137123        | 14           || 136123        | 14           || 137123        | 14           || 138123        | 14           || 137123        | 14           || 136127        | 14           |+---------------+--------------+--+

2、自定义函数的实现:
可以通过重载实现函数的多样性,函数必须是public

package hive.lower;import org.apache.hadoop.hive.ql.exec.UDF;import java.util.HashMap;import java.util.Map;/** * hive自定义函数 * Created by tianjun on 2017/3/21. */public class ToLowerCase extends UDF {    public Map<String,String> province = new HashMap<>();    //hive使用,必须是public    public String evaluate(String field){        String result = field.toLowerCase();        return result;    }    public String evaluate(int number){        province.put("136","beigjing");        province.put("137","shanghai");        province.put("138","wuhan");        String pnb = String.valueOf(number);        return province.get(pnb.substring(0,3))==null?"unknow":province.get(pnb.substring(0,3));    }    public static void main(String[] args) {        System.out.println(new ToLowerCase().evaluate(137214));    }}

二、步骤

1、打成jar包上传到服务器
2、将jar包添加到hive的classpath

hive>add JAR /root/original-mr.demo-1.0-SNAPSHOT.jar;

3、创建临时函数与开发好的java class关联

Hive>create temporary function province as 'hive.lower.ToLowerCase';

t_p的查询:

0: jdbc:hive2://mini01:10000> select id, province(name) lowerName from t_p;+-----+------------+--+| id  | lowername  |+-----+------------+--+| 14  | tianjun    || 1   | ni         || 2   | hao        || 3   | wo         || 4   | shi        || 5   | zhong      || 6   | guo        || 7   | ren        || 8   | ha         || 9   | wa         || 10  | ni         || 12  | wo         || 13  | ta         |+-----+------------+--+

t_flow的查询

0: jdbc:hive2://mini01:10000> select phone,province(phone) province ,flow from t_flow;+---------+-----------+-------+--+|  phone  | province  | flow  |+---------+-----------+-------+--+| 136214  | beigjing  | 10    || 137123  | shanghai  | 14    || 137123  | shanghai  | 14    || 136123  | beigjing  | 14    || 137123  | shanghai  | 14    || 138123  | wuhan     | 14    || 137123  | shanghai  | 14    || 136127  | beigjing  | 14    |+---------+-----------+-------+--+
0 0
原创粉丝点击