Write phoenix UDF

来源:互联网 发布:ubuntu分区 编辑:程序博客网 时间:2024/05/18 03:56

How to write custom UDF

You can follow these simple steps to write your UDF


  • create a new class derived fromorg.apache.phoenix.expression.function.ScalarFunction
  • implement the getDataType method which determines the return type of the function
  • implement the evaluate method which gets called to calculate the result for each row. The method is passed aorg.apache.phoenix.schema.tuple.Tuple that has the current state of the row and anorg.apache.hadoop.hbase.io.ImmutableBytesWritable that needs to be filled in to point to the result of the function execution. The method returns false if not enough information was available to calculate the result (usually because one of its arguments is unknown) and true otherwise.


Below are additional steps for optimizations


  • in order to have the possibility of contributing to the start/stop key of a scan, custom functions need to override the following two methods from ScalarFunction:

    /**

     * Determines whether or not a function may be used to form

     * the start/stop key of a scan

     * @return the zero-based position of the argument to traverse

     *  into to look for a primary key column reference, or

     *  {@value #NO_TRAVERSAL} if the function cannot be used to

     *  form the scan key.

     */

    public int getKeyFormationTraversalIndex() {

        return NO_TRAVERSAL;

    }


    /**

     * Manufactures a KeyPart used to construct the KeyRange given

     * a constant and a comparison operator.

     * @param childPart the KeyPart formulated for the child expression

     *  at the {@link #getKeyFormationTraversalIndex()} position.

     * @return the KeyPart for constructing the KeyRange for this

     *  function.

     */

    public KeyPart newKeyPart(KeyPart childPart) {

        return null;

    }

  • Additionally, to enable an ORDER BY to be optimized out or a GROUP BY to be done in-place,:

    /**

     * Determines whether or not the result of the function invocation

     * will be ordered in the same way as the input to the function.

     * Returning YES enables an optimization to occur when a

     * GROUP BY contains function invocations using the leading PK

     * column(s).

     * @return YES if the function invocation will always preserve order for

     * the inputs versus the outputs and false otherwise, YES_IF_LAST if the

     * function preserves order, but any further column reference would not

     * continue to preserve order, and NO if the function does not preserve

     * order.

     */

    public OrderPreserving preservesOrder() {

        return OrderPreserving.NO;

    }

Limitations

  • The jar containing the UDFs must be manually added/deleted to/from HDFS. Adding new SQL statements for add/remove jars(PHOENIX-1890)
  • Dynamic class loader copy the udf jars to {hbase.local.dir}/jars at the phoenix client/region server when the udf used in queries. The jars must be deleted manually once a function deleted.
  • functional indexes need to manually be rebuilt if the function implementation changes(PHOENIX-1907)
  • once loaded, a jar will not be unloaded, so you’ll need to put modified implementations into a different jar to prevent having to bounce your cluster(PHOENIX-1907)
  • to list the functions you need to query SYSTEM.“FUNCTION” table(PHOENIX-1921))
原创粉丝点击