Presto-[14]-Function

来源:互联网 发布:捍卫者软件 编辑:程序博客网 时间:2024/05/15 00:05

Functions

原文

https://prestodb.io/docs/current/develop/functions.html

Plugin Implementation

为实现新的function,必须写一个plugin,返回从getFunctions()返回更多的函数:

public class ExampleFunctionsPlugin        implements Plugin{    @Override    public Set<Class<?>> getFunctions()    {        return ImmutableSet.<Class<?>>builder()                .add(ExampleNullFunction.class)                .add(IsNullFunction.class)                .add(IsEqualOrNullFunction.class)                .add(ExampleStringFunction.class)                .add(ExampleAverageFunction.class)                .build();    }}

ImmutableSet 类是来自的Guava的工具类。源码包中presto-ml 模块中是 machine learning functions,presto-teradata-functions中是Teradata-compatible functions


Scalar Function Implementation

function框架使用annotations注解标识函数的相关信息,包括 name, description, return type and parameter types,下面是is_null函数的实现:

public class ExampleNullFunction{    @ScalarFunction("is_null")    @Description("Returns TRUE if the argument is NULL")    @SqlType(StandardTypes.BOOLEAN)    public static boolean isNull(@SqlNullable @SqlType(StandardTypes.VARCHAR) Slice string)    {        return (string == null);    }}

is_null 输入一个VARCHAR argument 返回一个BOOLEAN, 输入参数的类型为 SliceVARCHAR uses Slice,本质是一个byte[]

  • @SqlType:

    @SqlType注解用于声明返回类型, argument types. 返回type 必须、java code的arguments必须和native container types 相一致

  • @SqlNullable:

    @SqlNullable 注解说明argument可能是null,否则默认当入参为null,即返回null。当使用一个type ,这个type有primitive native container type ,如BigintType,use the object wrapper for the native container type when using @SqlNullable。如果当参数不为null,但返回值可以为null时,必须用@SqlNullable 注解。

Parametric Scalar Functions

有类型参数的Scalar functions 实现时要复杂些,使得上面的例子可以作用于任意类型的参数:

@ScalarFunction(name = "is_null")@Description("Returns TRUE if the argument is NULL")public final class IsNullFunction{    @TypeParameter("T")    @SqlType(StandardTypes.BOOLEAN)    public static boolean isNullSlice(@SqlNullable @SqlType("T") Slice value)    {        return (value == null);    }    @TypeParameter("T")    @SqlType(StandardTypes.BOOLEAN)    public static boolean isNullLong(@SqlNullable @SqlType("T") Long value)    {        return (value == null);    }    @TypeParameter("T")    @SqlType(StandardTypes.BOOLEAN)    public static boolean isNullDouble(@SqlNullable @SqlType("T") Double value)    {        return (value == null);    }    // ...and so on for each native container type}
  • @TypeParameter:

     @TypeParameter annotation 用于声明一个type parameter,这个用于 @SqlType annotation,或者函数的返回值类型,也可以用于annotate一个Type的type参数。运行时,engine将具体的类型绑定到这个参数中。 

  • @OperatorDependency可以用于声明将一个函数参数(an additional function for operating on the given type parameter is needed),

  • 例如下面将一个equals function绑定:

 
@ScalarFunction(name = "is_equal_or_null")@Description("Returns TRUE if arguments are equal or both NULL")public final class IsEqualOrNullFunction{    @TypeParameter("T")    @SqlType(StandardTypes.BOOLEAN)    public static boolean isEqualOrNullSlice(            @OperatorDependency(operator = OperatorType.EQUAL, returnType = StandardTypes.BOOLEAN, argumentTypes = {"T", "T"}) MethodHandle equals,            @SqlNullable @SqlType("T") Slice value1,            @SqlNullable @SqlType("T") Slice value2)    {        if (value1 == null && value2 == null) {            return true;        }        if (value1 == null || value2 == null) {            return false;        }        return (boolean) equals.invokeExact(value1, value2);    }    // ...and so on for each native container type}

Another Scalar Function Example

The lowercaser function takes a single VARCHAR argument and returns a VARCHAR, which is the argument converted to lower case:

public class ExampleStringFunction{    @ScalarFunction("lowercaser")    @Description("converts the string to alternating case")    @SqlType(StandardTypes.VARCHAR)    public static Slice lowercaser(@SqlType(StandardTypes.VARCHAR) Slice slice)    {        String argument = slice.toStringUtf8();        return Slices.utf8Slice(argument.toLowerCase());    }}

Note that for most common string functions, including converting a string to lower case, the Slice library also provides implementations that work directly on the underlying byte[], which have much better performance. This function has no @SqlNullable annotations, meaning that if the argument is NULL, the result will automatically be NULL (the function will not be called).

Aggregation Function Implementation

聚合函数略复杂.

  • AccumulatorState:

    所有的聚合函数对输入的rows计算成一个 state object,这个object实现AccumulatorState,对于简单的aggregations,仅仅 extend AccumulatorState 为一个新的 interface (有getters and setters you want),framework将为你实现所有的实现和序列化。如果需要更复杂的state object,需要通过注解AccumulatorStateMetadata实现AccumulatorStateFactory和AccumulatorStateSerializer。

The following code implements the aggregation function avg_double which computes the average of a DOUBLE column:

@AggregationFunction("avg_double")public class AverageAggregation{    @InputFunction    public static void input(LongAndDoubleState state, @SqlType(StandardTypes.DOUBLE) double value)    {        state.setLong(state.getLong() + 1);        state.setDouble(state.getDouble() + value);    }    @CombineFunction    public static void combine(LongAndDoubleState state, LongAndDoubleState otherState)    {        state.setLong(state.getLong() + otherState.getLong());        state.setDouble(state.getDouble() + otherState.getDouble());    }    @OutputFunction(StandardTypes.DOUBLE)    public static void output(LongAndDoubleState state, BlockBuilder out)    {        long count = state.getLong();        if (count == 0) {            out.appendNull();        }        else {            double value = state.getDouble();            DOUBLE.writeDouble(out, value / count);        }    }}

The average has two parts: the sum of the DOUBLE in each row of the column and the LONG count of the number of rows seen. LongAndDoubleState is an interface which extends AccumulatorState:

public interface LongAndDoubleState        extends AccumulatorState{    long getLong();    void setLong(long value);    double getDouble();    void setDouble(double value);}

更详细的说明下上面的一个annotations:

  • @InputFunction:

    The @InputFunction annotation declares the function which accepts input rows and stores them in the AccumulatorState. Similar to scalar functions you must annotate the arguments with @SqlType. Note that, unlike in the above scalar example where Slice is used to hold VARCHAR, the primitive double type is used for the argument to input. In this example, the input function simply keeps track of the running count of rows (via setLong()) and the running sum (via setDouble()).

  • @CombineFunction:

    The @CombineFunction annotation declares the function used to combine two state objects. This function is used to merge all the partial aggregation states. It takes two state objects, and merges the results into the first one (in the above example, just by adding them together).

  • @OutputFunction:

    The @OutputFunction is the last function called when computing an aggregation. It takes the final state object (the result of merging all partial states) and writes the result to a BlockBuilder.

  • Where does serialization happen, and what is GroupedAccumulatorState?

    The @InputFunction is usually run on a different worker from the @CombineFunction, so the state objects are serialized and transported between these workers by the aggregation framework. GroupedAccumulatorState is used when performing a GROUP BY aggregation, and an implementation will be automatically generated for you, if you don’t specify a AccumulatorStateFactory


阅读全文
'); })();
0 0
原创粉丝点击
热门IT博客
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 围巾男 男士方巾的系法图解 系围巾 韩版围巾 围巾女款 女围巾 小围巾 围脖和围巾的区别 皮草围脖 多用围巾 围巾一般多长 围巾哪个品牌好 夏季围巾 系列围巾 秋季围巾 男士方巾的系法 薄围巾 围巾款式图 毛线围脖 真丝围巾品牌 多功能围巾 围巾长款 头巾 羊绒围巾一般多少钱 方巾系法 围巾的 纱巾怎么系 男款围巾 围巾的各种围法 围巾怎么 苏格兰羊绒围巾价格 方巾 围巾纱巾 围巾怎么系 奢侈品围巾 中老年围巾 冬季围巾系法 山羊绒围巾价格 围巾女 围巾围脖 围巾的品牌