Presto-[14]-Function

来源：互联网发布：捍卫者软件编辑：程序博客网时间：2024/05/15 00:05

Functions

原文

https://prestodb.io/docs/current/develop/functions.html

Plugin Implementation

为实现新的function，必须写一个plugin，返回从getFunctions()返回更多的函数:

public class ExampleFunctionsPlugin        implements Plugin{    @Override    public Set<Class<?>> getFunctions()    {        return ImmutableSet.<Class<?>>builder()                .add(ExampleNullFunction.class)                .add(IsNullFunction.class)                .add(IsEqualOrNullFunction.class)                .add(ExampleStringFunction.class)                .add(ExampleAverageFunction.class)                .build();    }}

ImmutableSet 类是来自的Guava的工具类。源码包中presto-ml 模块中是 machine learning functions，presto-teradata-functions中是Teradata-compatible functions

Scalar Function Implementation

function框架使用annotations注解标识函数的相关信息，包括 name, description, return type and parameter types，下面是is_null函数的实现：

public class ExampleNullFunction{    @ScalarFunction("is_null")    @Description("Returns TRUE if the argument is NULL")    @SqlType(StandardTypes.BOOLEAN)    public static boolean isNull(@SqlNullable @SqlType(StandardTypes.VARCHAR) Slice string)    {        return (string == null);    }}

is_null 输入一个VARCHAR argument 返回一个BOOLEAN, 输入参数的类型为 Slice. VARCHAR uses Slice，本质是一个byte[]

@SqlType:
@SqlType注解用于声明返回类型， argument types. 返回type 必须、java code的arguments必须和native container types 相一致
@SqlNullable:
@SqlNullable 注解说明argument可能是null，否则默认当入参为null，即返回null。当使用一个type ，这个type有primitive native container type ，如BigintType，use the object wrapper for the native container type when using @SqlNullable。如果当参数不为null，但返回值可以为null时，必须用@SqlNullable 注解。

Parametric Scalar Functions

有类型参数的Scalar functions 实现时要复杂些，使得上面的例子可以作用于任意类型的参数：

@ScalarFunction(name = "is_null")@Description("Returns TRUE if the argument is NULL")public final class IsNullFunction{    @TypeParameter("T")    @SqlType(StandardTypes.BOOLEAN)    public static boolean isNullSlice(@SqlNullable @SqlType("T") Slice value)    {        return (value == null);    }    @TypeParameter("T")    @SqlType(StandardTypes.BOOLEAN)    public static boolean isNullLong(@SqlNullable @SqlType("T") Long value)    {        return (value == null);    }    @TypeParameter("T")    @SqlType(StandardTypes.BOOLEAN)    public static boolean isNullDouble(@SqlNullable @SqlType("T") Double value)    {        return (value == null);    }    // ...and so on for each native container type}

@TypeParameter:
@TypeParameter annotation 用于声明一个type parameter，这个用于 @SqlType annotation，或者函数的返回值类型，也可以用于annotate一个Type的type参数。运行时，engine将具体的类型绑定到这个参数中。
@OperatorDependency可以用于声明将一个函数参数（an additional function for operating on the given type parameter is needed），
例如下面将一个equals function绑定：

 @ScalarFunction(name = "is_equal_or_null")@Description("Returns TRUE if arguments are equal or both NULL")public final class IsEqualOrNullFunction{    @TypeParameter("T")    @SqlType(StandardTypes.BOOLEAN)    public static boolean isEqualOrNullSlice(            @OperatorDependency(operator = OperatorType.EQUAL, returnType = StandardTypes.BOOLEAN, argumentTypes = {"T", "T"}) MethodHandle equals,            @SqlNullable @SqlType("T") Slice value1,            @SqlNullable @SqlType("T") Slice value2)    {        if (value1 == null && value2 == null) {            return true;        }        if (value1 == null || value2 == null) {            return false;        }        return (boolean) equals.invokeExact(value1, value2);    }    // ...and so on for each native container type}

Another Scalar Function Example

The lowercaser function takes a single VARCHAR argument and returns a VARCHAR, which is the argument converted to lower case:

public class ExampleStringFunction{    @ScalarFunction("lowercaser")    @Description("converts the string to alternating case")    @SqlType(StandardTypes.VARCHAR)    public static Slice lowercaser(@SqlType(StandardTypes.VARCHAR) Slice slice)    {        String argument = slice.toStringUtf8();        return Slices.utf8Slice(argument.toLowerCase());    }}

Note that for most common string functions, including converting a string to lower case, the Slice library also provides implementations that work directly on the underlying byte[], which have much better performance. This function has no @SqlNullable annotations, meaning that if the argument is NULL, the result will automatically be NULL (the function will not be called).

Aggregation Function Implementation

聚合函数略复杂.

AccumulatorState:
所有的聚合函数对输入的rows计算成一个 state object，这个object实现AccumulatorState，对于简单的aggregations，仅仅 extend AccumulatorState 为一个新的 interface （有getters and setters you want），framework将为你实现所有的实现和序列化。如果需要更复杂的state object，需要通过注解AccumulatorStateMetadata实现AccumulatorStateFactory和AccumulatorStateSerializer。

The following code implements the aggregation function avg_double which computes the average of a DOUBLE column:

@AggregationFunction("avg_double")public class AverageAggregation{    @InputFunction    public static void input(LongAndDoubleState state, @SqlType(StandardTypes.DOUBLE) double value)    {        state.setLong(state.getLong() + 1);        state.setDouble(state.getDouble() + value);    }    @CombineFunction    public static void combine(LongAndDoubleState state, LongAndDoubleState otherState)    {        state.setLong(state.getLong() + otherState.getLong());        state.setDouble(state.getDouble() + otherState.getDouble());    }    @OutputFunction(StandardTypes.DOUBLE)    public static void output(LongAndDoubleState state, BlockBuilder out)    {        long count = state.getLong();        if (count == 0) {            out.appendNull();        }        else {            double value = state.getDouble();            DOUBLE.writeDouble(out, value / count);        }    }}

The average has two parts: the sum of the DOUBLE in each row of the column and the LONG count of the number of rows seen. LongAndDoubleState is an interface which extends AccumulatorState:

public interface LongAndDoubleState        extends AccumulatorState{    long getLong();    void setLong(long value);    double getDouble();    void setDouble(double value);}

更详细的说明下上面的一个annotations:

@InputFunction:
The @InputFunction annotation declares the function which accepts input rows and stores them in the AccumulatorState. Similar to scalar functions you must annotate the arguments with @SqlType. Note that, unlike in the above scalar example where Slice is used to hold VARCHAR, the primitive double type is used for the argument to input. In this example, the input function simply keeps track of the running count of rows (via setLong()) and the running sum (via setDouble()).
@CombineFunction:
The @CombineFunction annotation declares the function used to combine two state objects. This function is used to merge all the partial aggregation states. It takes two state objects, and merges the results into the first one (in the above example, just by adding them together).
@OutputFunction:
The @OutputFunction is the last function called when computing an aggregation. It takes the final state object (the result of merging all partial states) and writes the result to a BlockBuilder.
Where does serialization happen, and what is GroupedAccumulatorState?
The @InputFunction is usually run on a different worker from the @CombineFunction, so the state objects are serialized and transported between these workers by the aggregation framework. GroupedAccumulatorState is used when performing a GROUP BY aggregation, and an implementation will be automatically generated for you, if you don’t specify a AccumulatorStateFactory

阅读全文

'); })();