Hive 学习笔记(三)

来源:互联网 发布:js写的网站怎么做seo 编辑:程序博客网 时间:2024/05/17 01:50

自定义函数

当写hive UDF时,有两个选择:一是继承 UDF类,二是继承抽象类GenericUDF。这两种实现不同之处是:GenericUDF 可以处理复杂类型参数,并且继承GenericUDF更加有效率,因为UDF class 需要HIve使用反射的方式去实现。

UDF

一个UDF 必须满足两个条件:
1. 必须继承 org.apache.hadoop.hive.ql.exec.UDF类
2. 至少实现了一个 evaluate() 方法

/** * A User-defined function (UDF) for the use with Hive. * * New UDF classes need to inherit from this UDF class. * * Required for all UDF classes: 1. Implement one or more methods named * "evaluate" which will be called by Hive. The following are some examples: * public int evaluate(); public int evaluate(int a); public double evaluate(int * a, double b); public String evaluate(String a, int b, String c); * * "evaluate" should never be a void method. However it can return "null" if * needed. */@UDFType(deterministic = true)public class UDF {  /**   * The resolver to use for method resolution.   */  private UDFMethodResolver rslv;  /**   * The constructor.   */  public UDF() {    rslv = new DefaultUDFMethodResolver(this.getClass());  }  /**   * The constructor with user-provided UDFMethodResolver.   */  protected UDF(UDFMethodResolver rslv) {    this.rslv = rslv;  }  /**   * Sets the resolver.   *   * @param rslv   *          The method resolver to use for method resolution.   */  public void setResolver(UDFMethodResolver rslv) {    this.rslv = rslv;  }  /**   * Get the method resolver.   */  public UDFMethodResolver getResolver() {    return rslv;  }  /**   * These can be overriden to provide the same functionality as the   * correspondingly named methods in GenericUDF.   */  public String[] getRequiredJars() {    return null;  }  public String[] getRequiredFiles() {    return null;  }}

UDF 函数只能处理简单的java标量类型,如int,double,String等类型

GenericUDF

使用Geoip 根据IP定位国家,实现GenericUDF。

public class GeoLocUDF extends GenericUDF {    private LookupService service;    //用于将输入类型转换为目标类型    private ObjectInspectorConverters.Converter[] converters;    /**     * 初始化     * @param arguments     * @return     * @throws UDFArgumentException     */    @Override    public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {        if (arguments.length !=2){            throw new UDFArgumentLengthException(                    "The function COUNTRY(ip, geolocfile) takes exactly 2 arguments.");        }        converters = new ObjectInspectorConverters.Converter[arguments.length];        for (int i = 0; i < arguments.length; i++) {            //创建一个Converter,可以在evaluate方法中使用它 将参数的原生数据类型转存未Java Strings。            converters[i] = ObjectInspectorConverters.getConverter(arguments[i],                    PrimitiveObjectInspectorFactory.javaStringObjectInspector);        }        // 指定UDF (evaluate函数)返回类型为String        return PrimitiveObjectInspectorFactory                .getPrimitiveJavaObjectInspector(PrimitiveObjectInspector.PrimitiveCategory.STRING);    }    @Override    public Object evaluate(DeferredObject[] arguments) throws HiveException {        if (arguments[0].get() == null || arguments[1].get() == null) {            return null;        }        String ip = (String) converters[0].convert(arguments[0].get());        String filename = (String) converters[1].convert(arguments[1].get());        return lookup(ip, filename);    }    //在出现异常时,会调用这个方法,输出调用信息    @Override    public String getDisplayString(String[] children) {        return "country(" + children[0] + ", " + children[1] + ")";    }    protected String lookup(String ip, String filename) throws HiveException {        try {            if (service == null) {                URL u = getClass().getClassLoader().getResource(filename);                if (u == null) {                    throw new HiveException("Couldn't find geolocation file '" + filename + "'");                }                service =                        new LookupService(u.getFile(), LookupService.GEOIP_MEMORY_CACHE);            }            String countryCode = service.getCountry(ip).getCode();            if ("--".equals(countryCode)) {                return null;            }            return countryCode;        } catch (IOException e) {            throw new HiveException("Caught IO exception", e);        }    }}

在Hive中调用UDF

ADD JAR /path/hive-test.jar;CREATE temporary FUNCTION geoIp AS 'com.hadoop2.hive.GeoLocUDF';

temporary 关键字表示UDF只是为这个Hive会话过程定义的。

深入学习:https://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy

0 0
原创粉丝点击