Flink 1.3 Table and SQL Beta Java API 总结

来源:互联网 发布:java web是做什么的 编辑:程序博客网 时间:2024/06/03 03:46

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table_api.html#register-an-external-table-using-a-tablesource

Registering Tables

Register an external Table using a TableSource

目前使用TableSource读取Table过程中不支持打水印和设置字段如rowtime.rowtime。所以如果要使用的话,必须很蛋疼的要先把Table转为DataStream,再打水印,最后注册得到Table

CsvTableSource

注:1. 返回的是Builder;2.Builder.build()返回CsvTableSource;3.没有streamTableEnvironment.ingest统一改成和batchTableEnvironment一样用scan了。

Builder csvTableBuilder = CsvTableSource    .builder()    .path("/path/to/your/file.csv")    .field("name", Types.STRING())    .field("id", Types.INT())    .field("score", Types.DOUBLE())    .field("comments", Types.STRING())    .fieldDelimiter("#")    .lineDelimiter("$")    .ignoreFirstLine()    .ignoreParseErrors()    .commentPrefix("%");tableEnv.registerTableSource("mycsv", csvTableBuilder.build());Table streamTable = tableEnv.scan("mycsv");

SQL

Group Windows

Flink1.3新出来多特性,支持在SQL中写window,但官方文档有问题,这里我们来介绍下TUMBLEEvent Time(rowtime)的使用。

  • 设置EVENT_TIME:
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);        StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
  • 打水印
            DataStream<Tuple4<String, String, Double, Long>> messageStream = ...            DataStream<Tuple4<String, String, Double,Long>> addWatermarksStream =  messageStream.assignTimestampsAndWatermarks(                    new AscendingTimestampExtractor<Tuple4<String, String, Double, Long>>(){                                    private static final long serialVersionUID = 604378562837574295L;                        public long extractAscendingTimestamp(Tuple4<String, String, Double, Long> element) {                            return element.f3;                        }                    });
  • 注册table
tableEnv.registerDataStream("message",addWatermarksStream, "key,target,sarvalue,eventtime,rowtime.rowtime");

注:这里最后要多加一个字段,flink sql内部的保留字rowtime.rowtime

  • 写SQL
Table tableResult = tableEnv.sql("select key,TUMBLE_START(rowtime, INTERVAL '5' SECOND),target,sum(sarvalue),max(sarvalue),count(sarvalue) from message GROUP BY key,target,TUMBLE(rowtime, INTERVAL '5' SECOND) ");

注:这里rowtime看作是一个字段,而不是官网上的函数rowtime()

阅读全文
0 0