hive udf开发以及永久注册udf函数

来源:互联网 发布:淘宝用户开店规则大全 编辑:程序博客网 时间:2024/06/04 18:51

网上大部分资料显示注册 hive udf 函数大体有两种方法:

第一种、创建临时函数。如在hive CLI执行下面命令:

hive> add jar helloudf.jar;
hive> create temporary function helloworld as 'com.hrj.hive.udf.helloUDF';
hive> select helloworld(t.col1) from t limit 10;
hive> drop temporary function helloworld;


第二种、修改源代码,重编译,注册函数。这种方法可以永久性的保存udf函数,但是风险很大,我刚开始也是使用这种方法,结果悲剧了;hiveserver被搞挂了,最后换了一个节点另外安装hiveserver服务器。至于这种方法的实现网上资料很多。


对比分析上面两种方法,你会发现,第一种方法无法满足现实开发需要,因为实际业务中需要一些稳定的公共udf函数;第二种方法虽然满足了第一种的需求,但是风险太大,容易造成hive异常。

于是我查看了hive官方文档,关于udf 函数开发有以下一段话:

Permanent Functions

In Hive 0.13 or later, functions can be registered to the metastore, so they can be referenced in a query without having to create a temporary function each session.

Create Function

Version information

Icon

As of Hive 0.13.0 (HIVE-6047).

CREATE FUNCTION [db_name.]function_name AS class_name
  [USING JAR|FILE|ARCHIVE'file_uri'[, JAR|FILE|ARCHIVE'file_uri'] ];

This statement lets you create a function that is implemented by the class_name. Jars, files, or archives which need to be added to the environment can be specified with the USING clause; when the function is referenced for the first time by a Hive session, these resources will be added to the environment as if ADD JAR/FILE had been issued. If Hive is not in local mode, then the resource location must be a non-local URI such as an HDFS location.

The function will be added to the database specified, or to the current database at the time that the function was created. The function can be referenced by fully qualifying the function name (db_name.funciton_name), or can be referenced without qualification if the function is in the current database.

上面所述的意思是,hive 0.13版本以后(包过0.13)是支持注册永久函数的,而且提供了注册的方法。注意在上面创建注册udf函数的时候,要注意环境模式,我是用的是hdfs上面的jar包,所以我的代码实现如下:

create function default.url_decode as 'com.richinfo.udf.DecodeURL' using jar 'hdfs://192.168.220.111:21900/user/zjf/udf/URLDecodeHiveUDF.jar';

创建完以后我注意了下,我如果在另外一个数据库引用url_decode函数需要加上数据库名,因为上面所创建的url_decode函数是隶属于default的。还有就是如果你已经在default中创建了url_decode函数,就不能在其他数据库创建函数名一样的udf函数。


官网udf 函数开发:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateFunction


0 0