Flink中在source流中自定义timestamp和watermark
来源:互联网 发布:轩辕剑符鬼进阶数据 编辑:程序博客网 时间:2024/05/29 17:36
To work with Event Time, streaming programs need to set the time characteristic accordingly.
首先配置成,Event Time
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Assigning Timestamps
接着,我们需要定义如何去获取event time和如何产生Watermark?
一种方式,在source中写死,
@Overridepublic void run(SourceContext<MyType> ctx) throws Exception { while (/* condition */) { MyType next = getNext(); ctx.collectWithTimestamp(next, next.getEventTimestamp()); if (next.hasWatermarkTime()) { ctx.emitWatermark(new Watermark(next.getWatermarkTime())); } }}
这种方式明显比较low,不太方便,并且这种方式是会被TimestampAssigner 覆盖掉的,
所以看看第二种方式,
Timestamp Assigners / Watermark Generators
一般在会在source后加些map,filter做些初始化或格式化
然后,在任意需要用到event time的操作之前,比如window,进行设置
给个例子,
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);DataStream<MyEvent> stream = env.addSource(new FlinkKafkaConsumer09<MyEvent>(topic, schema, props));DataStream<MyEvent> withTimestampsAndWatermarks = stream .filter( event -> event.severity() == WARNING ) .assignTimestampsAndWatermarks(new MyTimestampsAndWatermarks());withTimestampsAndWatermarks .keyBy( (event) -> event.getGroup() ) .timeWindow(Time.seconds(10)) .reduce( (a, b) -> a.add(b) ) .addSink(...);
那么Timestamp Assigners如何实现,比如例子中给出的MyTimestampsAndWatermarks
有3种,
、
DataStream<MyEvent> stream = ...DataStream<MyEvent> withTimestampsAndWatermarks = stream.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<MyEvent>() { @Override public long extractAscendingTimestamp(MyEvent element) { return element.getCreationTime(); }});
这种没人用吧,不如直接用processing time了
定期的发送,你可以通过ExecutionConfig.setAutoWatermarkInterval(...),来设置这个频率
/** * This generator generates watermarks assuming that elements come out of order to a certain degree only. * The latest elements for a certain timestamp t will arrive at most n milliseconds after the earliest * elements for timestamp t. */public class BoundedOutOfOrdernessGenerator extends AssignerWithPeriodicWatermarks<MyEvent> { private final long maxOutOfOrderness = 3500; // 3.5 seconds private long currentMaxTimestamp; @Override public long extractTimestamp(MyEvent element, long previousElementTimestamp) { long timestamp = element.getCreationTime(); currentMaxTimestamp = Math.max(timestamp, currentMaxTimestamp); return timestamp; } @Override public Watermark getCurrentWatermark() { // return the watermark as current highest timestamp minus the out-of-orderness bound return new Watermark(currentMaxTimestamp - maxOutOfOrderness); }}/** * This generator generates watermarks that are lagging behind processing time by a certain amount. * It assumes that elements arrive in Flink after at most a certain time. */public class TimeLagWatermarkGenerator extends AssignerWithPeriodicWatermarks<MyEvent> { private final long maxTimeLag = 5000; // 5 seconds @Override public long extractTimestamp(MyEvent element, long previousElementTimestamp) { return element.getCreationTime(); } @Override public Watermark getCurrentWatermark() { // return the watermark as current time minus the maximum time lag return new Watermark(System.currentTimeMillis() - maxTimeLag); }}
上面给出两个case,区别是第一种,会以event time的Max,来设置watermark
第二种,是以当前的processing time来设置watermark
With Punctuated Watermarks
To generate Watermarks whenever a certain event indicates that a new watermark can be generated, use theAssignerWithPunctuatedWatermarks
. For this class, Flink will first call the extractTimestamp(...)
method to assign the element a timestamp, and then immediately call for that element the checkAndGetNextWatermark(...)
method.
The checkAndGetNextWatermark(...)
method gets the timestamp that was assigned in the extractTimestamp(...)
method, and can decide whether it wants to generate a Watermark. Whenever the checkAndGetNextWatermark(...)
method returns a non-null Watermark, and that Watermark is larger than the latest previous Watermark, that new Watermark will be emitted.
这种即,watermark不是由时间来触发的,而是以特定的event触发的,即本到某些特殊的event或message,才触发watermark
所以它的接口叫,checkAndGetNextWatermark
需要先check
public class PunctuatedAssigner extends AssignerWithPunctuatedWatermarks<MyEvent> { @Override public long extractTimestamp(MyEvent element, long previousElementTimestamp) { return element.getCreationTime(); } @Override public Watermark checkAndGetNextWatermark(MyEvent lastElement, long extractedTimestamp) { return element.hasWatermarkMarker() ? new Watermark(extractedTimestamp) : null; }}
- Flink中在source流中自定义timestamp和watermark
- Flink 两种发送自定义的timestamp以及watermark的方式
- 云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战012-Flink在流处理中常见的sink和source001
- 云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战013-Flink在流处理中常见的sink和source002
- Flink流计算编程--watermark(水位线)简介
- 云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战006-DataStream与MySql自定义sink和source(Scala版)001
- 云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战007-DataStream与MySql自定义sink和source(Scala版)002
- 云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战008-DataStream与MySql自定义sink和source(Scala版)003
- 云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战009-DataStream与MySql自定义sink和source(Java版)001
- 云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战010-DataStream与MySql自定义sink和source(Java版)002
- 云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战011-DataStream与MySql自定义sink和source(Java版)003
- 【云星数据---Apache Flink实战系列(精品版)】:Apache Flink高级特性与高级应用013-Flink在批处理中常见的sink和source001
- 【云星数据---Apache Flink实战系列(精品版)】:Apache Flink高级特性与高级应用014-Flink在批处理中常见的sink和source002
- Flink WaterMark机制白话分析
- mysql中TIMESTAMP和DATETIME
- source insight 中自定义命令
- Flink学习笔记 --- Flink中Windows机制
- bash中. 和 source
- 在不同的情况下拖拽图片
- 选中CTreeCtrl的节点,弹出对话框嵌入到客户区中
- iOS源码解析—SDWebImage(SDWebImageManager)
- 源码阅读--RxJava(二)
- authconfig 用法
- Flink中在source流中自定义timestamp和watermark
- CSS3实现正方形立方体旋转
- 遍历目录下的文件
- WIN7+QTmingw+opencv安装配置
- springmvc 移动端接收图片
- 【java】全排列 枚举子集
- Annotation注解
- activity管理类
- iOS RunLoop简介