解读Flink中轻量级的异步快照机制--Flink 1.2 源码
来源:互联网 发布:医保报销软件 编辑:程序博客网 时间:2024/05/01 23:59
上一篇文章中,对于ABS算法,其实现主要通过checkpoint的barrier的阻塞与释放来实现。
本片重点关注ABS在Flink 1.2中源码的实现。
1、CheckpointBarrierHandler
此接口位于org.apache.flink.streaming.runtime.io中,管理从input channel获取的barrier的信息。它提供了如下几种方法:
public interface CheckpointBarrierHandler { BufferOrEvent getNextNonBlocked() throws Exception; void registerCheckpointEventHandler(StatefulTask task); void cleanup() throws IOException; boolean isEmpty(); long getAlignmentDurationNanos();}
其中关于barrier的阻塞与释放,主要在getNextNonBlocked() 中实现。
根据CheckpointingMode的不同,Flink提供了2种不同的检查点模式:
1、Exactly once2、At least once
其中默认的模式是EXACTLY_ONCE。
对应这两种不同的模式,Flink提供了2种不同的实现类:
1、BarrierBuffer类(对应于Exactly Once)2、BarrierTracker类(对应于At Least Once)
由于论文中重点强调input channel的阻塞,即对于Exactly Once的实现,因此我们这里也重点关注代码中BarrierBuffer类的实现。
2、BarrierBuffer类
我们先回顾一下上一篇论文中关于此算法的伪码:
其核心就是一个input channel收到barrier,立刻阻塞,然后判断是否收到所有input channel的barrier,如果全部收到,则广播出barrier,触发此task的检查点,并对阻塞的channel释放锁。
实际上,为了防止输入流的背压(back-pressuring),BarrierBuffer并不是真正的阻塞这个流,而是将此channel中,barrier之后数据通过一个BufferSpiller来buffer起来,当channel的锁释放后,再从buffer读回这些数据,继续处理。
下面我们看看这个类的具体实现:
public class BarrierBuffer implements CheckpointBarrierHandler { private static final Logger LOG = LoggerFactory.getLogger(BarrierBuffer.class); /** The gate that the buffer draws its input from */ private final InputGate inputGate; //一个task对应一个InputGate,代表input的数据集合(可能来自不同的input channel) /** Flags that indicate whether a channel is currently blocked/buffered */ private final boolean[] blockedChannels; // 标记每个input channel是否被阻塞(或者叫被buffer) /** The total number of channels that this buffer handles data from */ private final int totalNumberOfInputChannels; // input channel的数量,可通过InputGate获得 /** To utility to write blocked data to a file channel */ private final BufferSpiller bufferSpiller; // 将被阻塞的input channel的数据写到buffer /** The pending blocked buffer/event sequences. Must be consumed before requesting * further data from the input gate. */ private final ArrayDeque<BufferSpiller.SpilledBufferOrEventSequence> queuedBuffered; // barrier到达时,此operator中在之前buffered的数据要消费掉 /** The maximum number of bytes that may be buffered before an alignment is broken. -1 means unlimited */ private final long maxBufferedBytes; // 最多允许buffer的字节数,-1代表无限制 /** The sequence of buffers/events that has been unblocked and must now be consumed * before requesting further data from the input gate */ private BufferSpiller.SpilledBufferOrEventSequence currentBuffered; // 已经buffer的数据 /** Handler that receives the checkpoint notifications */ private StatefulTask toNotifyOnCheckpoint; // 通知检查点进行 /** The ID of the checkpoint for which we expect barriers */ private long currentCheckpointId = -1L; // 当前检查点ID /** The number of received barriers (= number of blocked/buffered channels) * IMPORTANT: A canceled checkpoint must always have 0 barriers */ private int numBarriersReceived; // 接收到的barrier的数量,这个值最终要等于buffered channel的数量。当一个检查点被cancel时,此值为0 /** The number of already closed channels */ private int numClosedChannels; // 已经关闭的channel的数量 /** The number of bytes in the queued spilled sequences */ private long numQueuedBytes; // spill到队列中的数据的字节数 /** The timestamp as in {@link System#nanoTime()} at which the last alignment started */ private long startOfAlignmentTimestamp; // 上一次对齐开始时的时间戳 /** The time (in nanoseconds) that the latest alignment took */ private long latestAlignmentDurationNanos; // 最近一次对齐持续的时间 /** Flag to indicate whether we have drawn all available input */ private boolean endOfStream; // 标记是否流结束(所有的input已经收到barrier,标记检查点完成) /** * Creates a new checkpoint stream aligner. * * <p>There is no limit to how much data may be buffered during an alignment. * * @param inputGate The input gate to draw the buffers and events from. * @param ioManager The I/O manager that gives access to the temp directories. * * @throws IOException Thrown, when the spilling to temp files cannot be initialized. */ public BarrierBuffer(InputGate inputGate, IOManager ioManager) throws IOException { this (inputGate, ioManager, -1); } /** * Creates a new checkpoint stream aligner. * * <p>The aligner will allow only alignments that buffer up to the given number of bytes. * When that number is exceeded, it will stop the alignment and notify the task that the * checkpoint has been cancelled. * * @param inputGate The input gate to draw the buffers and events from. * @param ioManager The I/O manager that gives access to the temp directories. * @param maxBufferedBytes The maximum bytes to be buffered before the checkpoint aborts. * * @throws IOException Thrown, when the spilling to temp files cannot be initialized. */ public BarrierBuffer(InputGate inputGate, IOManager ioManager, long maxBufferedBytes) throws IOException { checkArgument(maxBufferedBytes == -1 || maxBufferedBytes > 0); this.inputGate = inputGate; this.maxBufferedBytes = maxBufferedBytes; this.totalNumberOfInputChannels = inputGate.getNumberOfInputChannels(); this.blockedChannels = new boolean[this.totalNumberOfInputChannels]; this.bufferSpiller = new BufferSpiller(ioManager, inputGate.getPageSize()); this.queuedBuffered = new ArrayDeque<BufferSpiller.SpilledBufferOrEventSequence>(); }
其构造方法中传入InputGate参数,每个task都会对应有一个InputGate,目的是专门处理流入到此task中的所有的输入信息,这些输入可能来自多个partition。
我们再看看BarrierBuffer中最重要的方法:getNextNonBlocked。
getNextNonBlocked
// ------------------------------------------------------------------------// Buffer and barrier handling// ------------------------------------------------------------------------@Override public BufferOrEvent getNextNonBlocked() throws Exception { while (true) { // process buffered BufferOrEvents before grabbing new ones BufferOrEvent next; // buffer代表数据,event代表事件,例如barrier就是个事件 if (currentBuffered == null) { next = inputGate.getNextBufferOrEvent();// 如果已经buffer的数据为空,则直接从inputGate中获取下一个BufferOrEvent } else { next = currentBuffered.getNext(); // 否则,从currentBuffered的队列中拿到下一个BufferOrEvent if (next == null) { // 如果next为空,说明已经buffer的数据被处理完了 completeBufferedSequence(); // 清空currentBuffered,然后继续处理queuedBuffered中的数据 return getNextNonBlocked(); // 递归调用,此时currentBuffered如果为null,则queuedBuffered也为null;否则如果currentBuffered不为null,说明还要继续处理queuedBuffere中的数据 } } if (next != null) { if (isBlocked(next.getChannelIndex())) { //如果这个channel还是被阻塞,则继续把这条record添加到buffer中 // if the channel is blocked we, we just store the BufferOrEvent bufferSpiller.add(next); checkSizeLimit(); } else if (next.isBuffer()) {//否则如果这个channel不再被阻塞,且下一条记录是数据,则返回此数据 return next; } else if (next.getEvent().getClass() == CheckpointBarrier.class) { // 如果下一个是Barrier,且流没有结束,则说明这个channel收到了barrier了 if (!endOfStream) { // process barriers only if there is a chance of the checkpoint completing processBarrier((CheckpointBarrier) next.getEvent(), next.getChannelIndex()); // 此时,进行processBarrier处理 } } else if (next.getEvent().getClass() == CancelCheckpointMarker.class) { // 如果下一个是带有cancel标记的barrier,则进行processCancellationBarrier处理 processCancellationBarrier((CancelCheckpointMarker) next.getEvent()); } else { if (next.getEvent().getClass() == EndOfPartitionEvent.class) { // 如果此partition的数据全部消费完 processEndOfPartition(); // 增加numClosedChannels的值,且将此channel解锁 } return next; } } else if (!endOfStream) { // 如果next为null且不是stream的终点,则置为终点,且释放所有channel的锁,重置初始值 // end of input stream. stream continues with the buffered data endOfStream = true; releaseBlocksAndResetBarriers(); return getNextNonBlocked(); } else { // final end of both input and buffered data return null; } } }
这个方法中,当收到barrier后,立刻进行processBarrier()的处理,这也是其核心所在。
processBarrier
private void processBarrier(CheckpointBarrier receivedBarrier, int channelIndex) throws Exception { final long barrierId = receivedBarrier.getId(); // fast path for single channel cases if (totalNumberOfInputChannels == 1) { // 如果总共的channel数量只有1,此时说明这个operator只有一个input if (barrierId > currentCheckpointId) { //如果这个barrierId大于当前的检查点ID,则说明这个barrier是一个新的barrier // new checkpoint currentCheckpointId = barrierId;//将这个barrierId赋给当前的检查点ID notifyCheckpoint(receivedBarrier); //触发检查点 } return; } // -- general code path for multiple input channels -- if (numBarriersReceived > 0) { //如果已经收到过barrier // this is only true if some alignment is already progress and was not canceled if (barrierId == currentCheckpointId) { // 判断此barrierId与当前的检查点ID是否一致 // regular case onBarrier(channelIndex); // 如果一直,则阻塞此channel } else if (barrierId > currentCheckpointId) { // 如果barrierId大于当前的检查点ID,则说明当前的检查点过期了,跳过当前的检查点 // we did not complete the current checkpoint, another started before LOG.warn("Received checkpoint barrier for checkpoint {} before completing current checkpoint {}. " + "Skipping current checkpoint.", barrierId, currentCheckpointId); // let the task know we are not completing this notifyAbort(currentCheckpointId, new CheckpointDeclineSubsumedException(barrierId));// 通知task终止当前的检查点 // abort the current checkpoint releaseBlocksAndResetBarriers();// 释放所有channel的锁 // begin a the new checkpoint beginNewAlignment(barrierId, channelIndex);// 根据barrierId,开始新的检查点 } else { // ignore trailing barrier from an earlier checkpoint (obsolete now) return; } } else if (barrierId > currentCheckpointId) { // 如果第一次收到的barrierID大于当前的检查点ID,说明是一个新的barrier // first barrier of a new checkpoint beginNewAlignment(barrierId, channelIndex);// 根据barrierId,开始新的检查点 } else { // either the current checkpoint was canceled (numBarriers == 0) or // this barrier is from an old subsumed checkpoint return; } // check if we have all barriers - since canceled checkpoints always have zero barriers // this can only happen on a non canceled checkpoint if (numBarriersReceived + numClosedChannels == totalNumberOfInputChannels) { //如果收到所有channel的barrier,说明走到了 // actually trigger checkpoint if (LOG.isDebugEnabled()) { LOG.debug("Received all barriers, triggering checkpoint {} at {}", receivedBarrier.getId(), receivedBarrier.getTimestamp()); } releaseBlocksAndResetBarriers(); // 释放所有channel的锁 notifyCheckpoint(receivedBarrier);// 触发检查点 } }
Flink 1.2中有个变化就是判断当前的operator是否只有一个input channel且收到了最新的barrier,如果是,则开通一个绿色通道,直接进行检查点:notifyCheckpoint。
否则如果有多个input channel(totalNumberOfInputChannels是通过InputGate获得),则只有当收到所有input channel的最新的barrier后,才开始进行检查点:notifyCheckpoint,否则就要先阻塞该input channel,实际上是buffer起来后续的数据。
notifyCheckpoint
private void notifyCheckpoint(CheckpointBarrier checkpointBarrier) throws Exception { if (toNotifyOnCheckpoint != null) { CheckpointMetaData checkpointMetaData = new CheckpointMetaData(checkpointBarrier.getId(), checkpointBarrier.getTimestamp()); long bytesBuffered = currentBuffered != null ? currentBuffered.size() : 0L; checkpointMetaData .setBytesBufferedInAlignment(bytesBuffered) .setAlignmentDurationNanos(latestAlignmentDurationNanos); toNotifyOnCheckpoint.triggerCheckpointOnBarrier(checkpointMetaData); } }
toNotifyOnCheckpoint是个StatefulTask接口,管理每个task接收检查点的通知,其triggerCheckpoint方法是真正的实现。
3、Flink 1.2中webUI对checkpoint的改进
webUI中对checkpoint的部分增加了很多的元数据信息,包括检查点的详细信息:
包括每个checkpoint中state的大小,检查点的状态,完成的时间以及持续的时间。并且对每一个检查点,可以额看到每一个subtask的详细信息。这点对于检查点的管理、监控以及对state的调整都起到了积极的作用。
4、总结
ABS在Flink中默认是Exactly Once,需要对齐,对齐的算法就是阻塞+解除。阻塞和解除阻塞都有各自的判断依据。
- 解读Flink中轻量级的异步快照机制--Flink 1.2 源码
- 解读Flink中轻量级的异步快照机制--论文
- Flink源码解读--FlinkKafkaConsumer09
- Flink源码解读--FlinkKafkaProducer09
- Flink学习笔记 --- Flink中Windows机制
- Flink容错机制源码分析
- Flink
- Flink中task之间的数据交换机制
- Apache Flink数据流的Fault Tolerance机制
- Flink内存管理源码解读之基础数据结构
- Flink内存管理源码解读之内存管理器
- Flink源码解析之State的实现
- Flink中slot的一点理解
- Flink中richfunction的一点小作用
- Flink源码阅读:如何使用FlinkKafkaProducer将数据在Kafka的多个partition中均匀分布
- Flink sql的实现
- Apache Flink的特性
- Flink广播的使用
- android contentprovider简单讲解
- 十分钟搞定pandas
- QFile
- SQLServer 修改表字段的长度
- Python爬虫入门学习--(向网页提交数据)
- 解读Flink中轻量级的异步快照机制--Flink 1.2 源码
- 从jvm的角度来看java的多线程
- oj1070
- Android ubuntu录制手机 GIF 视频
- html5新特性-----离线存储
- GCD入门(四): 完结
- leetcode_middle_33_482. License Key Formatting
- linux线程编程出错
- PHP环境搭建(1)——安装Apache