hive UDF实现一个字符串解码函数

来源:互联网 发布:易语言获取网页源码 编辑:程序博客网 时间:2024/06/06 03:26

其实hive的udf 是比较容易实现的,只需要继承UDF,实现其evaluate()方法,代码如下。

@Description(name = "decoder_url", value = "_FUNC_(url [,code][,count]) - decoder a URL from a String for count times using code as encoding scheme ", extended = ""    + "if count is not given ,the url will be decoderd for 2 time,"    + "if code is not given ,GBK is used")public class UDFDecoderUrl extends UDF {  private String url = null;  private int times = 2;  private String code = "GBK";  public UDFDecoderUrl() {  }  public String evaluate(String urlStr, String srcCode, int count) {    if (urlStr == null) {      return null;    }    if (count <= 0) {      return urlStr;    }    if (srcCode != null) {      code = srcCode;    }    url = urlStr;    times = count;    for (int i = 0; i < times; i++) {      url = decoder(url, code);    }    return url;  }  public String evaluate(String urlStr, String srcCode) {    if (urlStr == null) {      return null;    }    url = urlStr;    code = srcCode;    return evaluate(url, code,times);  }  public String evaluate(String urlStr, int count) {    if (urlStr == null) {      return null;    }    if (count <= 0) {      return urlStr;    }    url = urlStr;    times = count;        return evaluate(url, code,times);  }  public String evaluate(String urlStr) {    if (urlStr == null) {      return null;    }    url = urlStr;    return evaluate(url, code,times);  }  private String decoder(String urlStr, String code) {    if (urlStr == null || code == null) {      return null;    }    try {      urlStr = URLDecoder.decode(urlStr, code);    } catch (Exception e) {      return null;    }    return urlStr;  }}

在类中org.apache.hadoop.hive.ql.exec.FunctionRegistry中添加

    registerUDF("decoder_url", UDFDecoderUrl.class, false);

编译hive ,或者通过配置文件方式,让其读取,以后新加的函数配置到配置文件中一劳永逸。

 

上面的类UDFDecoderUrl需要打成jar包加载到hive中,需要再hive-site.xml配置如下加载jar包

<property>
  <name>hive.aux.jars.path</name>
  <value>file:///opt/hive/sohu/hive-udf-0.0.1.jar</value>
  <description>These JAR file are available to all users for all jobs</description>
</property>


 

原创粉丝点击