写hive的udf函数
来源:互联网 发布:淘宝店铺突然没流量 编辑:程序博客网 时间:2024/06/05 22:30
最近感受了Hive的udf函数的强大威力了,不仅可以使用很多已经有的udf函数,还可以自己定义符合业务场景的udf函数,下面就说一下如何写udf/udaf/udtf函数,算是一个入门介绍吧。
First, you need to create a new class that extends UDF, with one or more methods named evaluate.
- package com.example.hive.udf;
-
- import org.apache.hadoop.hive.ql.exec.UDF;
- import org.apache.hadoop.io.Text;
-
- public final class Lower extends UDF {
- public Text evaluate(final Text s) {
- if (s == null) { return null; }
- return new Text(s.toString().toLowerCase());
- }
- }
After compiling your code to a jar, you need to add this to the hive classpath.
Once hive is started up with your jars in the classpath, the final step is to register your function
- create temporary function my_lower as 'com.example.hive.udf.Lower';
上面主要描述了实现一个udf的过程,首先自然是实现一个UDF函数,然后编译为jar并加入到hive的classpath中,最后创建一个临时变量名字让hive中调用。
下面这个表格可以更加清晰的看出udf/udaf/udtf之间的区别
Show几个例子:
1) UDF (参考:http://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udf/)
- package org.apache.hadoop.hive.contrib.udf.example;
-
- import org.apache.hadoop.hive.ql.exec.UDF;
-
- /**
- * UDFExampleAdd.
- *
- */
- public class UDFExampleAdd extends UDF {
-
- public Integer evaluate(Integer... a) {
- int total = 0;
- for (Integer element : a) {
- if (element != null) {
- total += element;
- }
- }
- return total;
- }
-
- public Double evaluate(Double... a) {
- double total = 0;
- for (Double element : a) {
- if (element != null) {
- total += element;
- }
- }
- return total;
- }
-
- }
2)UDAF(http://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udaf/
)
3)UDTF(http://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/
)
- package org.apache.hadoop.hive.contrib.udtf.example;
-
- import java.util.ArrayList;
- import java.util.List;
-
- import org.apache.hadoop.hive.ql.exec.Description;
- import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
- import org.apache.hadoop.hive.ql.metadata.HiveException;
- import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
- import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
- import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
- import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
- import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
-
- /**
- * GenericUDTFExplode2.
- *
- */
- @Description(name = "explode2",
- value = "_FUNC_(a) - like explode, but outputs two identical columns (for testing purposes)")
- public class GenericUDTFExplode2 extends GenericUDTF {
-
- ListObjectInspector listOI = null;
-
- @Override
- public void close() throws HiveException {
- }
-
- @Override
- public StructObjectInspector initialize(ObjectInspector[] args)
- throws UDFArgumentException {
-
- if (args.length != 1) {
- throw new UDFArgumentException("explode() takes only one argument");
- }
-
- if (args[0].getCategory() != ObjectInspector.Category.LIST) {
- throw new UDFArgumentException("explode() takes an array as a parameter");
- }
- listOI = (ListObjectInspector) args[0];
-
- ArrayList<String> fieldNames = new ArrayList<String>();
- ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
- fieldNames.add("col1");
- fieldNames.add("col2");
- fieldOIs.add(listOI.getListElementObjectInspector());
- fieldOIs.add(listOI.getListElementObjectInspector());
- return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
- fieldOIs);
- }
-
- Object forwardObj[] = new Object[2];
-
- @Override
- public void process(Object[] o) throws HiveException {
-
- List<?> list = listOI.getList(o[0]);
- for (Object r : list) {
- forwardObj[0] = r;
- forwardObj[1] = r;
- forward(forwardObj);
- }
- }
-
- @Override
- public String toString() {
- return "explode";
- }
- }