flume MemoryChannel分析

来源:互联网 发布:金融数据分析知乎 编辑:程序博客网 时间:2024/05/14 10:44

前边介绍了flume的基本概念和Source部分,接下来看看flume中的第二大组件Channel中的MemoryChannel。MemoryChannel是完全在内存中运行,速度很快,其优点同样也就成了缺点,不能持久化,若机器发生宕机或断电,数据就会丢失。在实际使用中需要根据具体的需求进行合理的选择。 
先看下MemoryChannel的基本的类图,根据这个结构图可以很好的帮助理解。 
 
MemoryChannel中最重要的部分主要是Channel、Transaction 和Configurable三个接口。 
Channel接口中主要声明了Channel中的三个方法:

  1. public void put(Event event) throws ChannelException; //刚方法从指定的Source中获得Event放入指定的Channel中
  2. public Event take() throws ChannelException; //take方法主要是从Channel中取出event放入Sink中
  3. public Transaction getTransaction(); //getTransaction方法是获得当前Channel的事务实例

Transaction接口主要声明了flume中事务机制的四个方法:

  1. enum TransactionState { Started, Committed, RolledBack, Closed } //枚举类型,指定了事务的四种状态,事务开始、提交、失败回滚、关闭
  2. void begin();
  3. void commit();
  4. void rollback();
  5. void close();

Configurable接口主要是和flume配置组件相关的,需要从flume配置系统获取配置信息的任何组件,都必须实现该接口。该接口中只声明了一个context方法,用于获取配置信息。 
以上方法的具体内容都是在具体的Channel中实现的,系统在启动时会根据配置文件信息调用相应的组件的方法实现,这种实现称为回调,类似于C语言中的钩子函数,先声明方法然后在具体的需要时调用相应的实现方法。

接下来看看具体的代码实现,代码开始是定义一些默认的配置信息,Channel、Transaction事务大小等信息:

  1. public class MemoryChannel extends BasicChannelSemantics {
  2. private static Logger LOGGER = LoggerFactory.getLogger(MemoryChannel.class);
  3. private static final Integer defaultCapacity = 100;
  4. private static final Integer defaultTransCapacity = 100;
  5. private static final double byteCapacitySlotSize = 100;
  6. private static final Long defaultByteCapacity = (long)(Runtime.getRuntime().maxMemory() * .80);
  7. private static final Integer defaultByteCapacityBufferPercentage = 20;
  8. private static final Integer defaultKeepAlive = 3;

接下来就是实现Channel的put、get方法和事务的commit、rollback方法,这几个方法都是在内部类 MemoryTransaction 实现的,看到下边的几个方法名字,大家可能会问怎么每个方法前边都有个do,实际上这是因为 MemoryTransaction 继承了BasicTransactionSemantics抽象类,而不是直接实现了 Channel 和 Transaction 接口,在 BasicTransactionSemantics抽象接口中 对上边提到的几种方法做了一些简单的封装,在内部调用就是调用类似doTake的方法:

  1. protected Event take() {
  2. Preconditions.checkState(Thread.currentThread().getId() == initialThreadId,
  3. "take() called from different thread than getTransaction()!");
  4. Preconditions.checkState(state.equals(State.OPEN),
  5. "take() called when transaction is %s!", state);
  6. try {
  7. return doTake();
  8. } catch (InterruptedException e) {
  9. Thread.currentThread().interrupt();
  10. return null;
  11. }
  12. }

具体的 MemoryTransaction 内部类实现的几个方法如下代码:

  1. private class MemoryTransaction extends BasicTransactionSemantics {
  2. private LinkedBlockingDeque<Event> takeList; //阻塞双端队列,从channel中取event先放入takeList,输送到sink,commit成功,从channel queue中删除
  3. private LinkedBlockingDeque<Event> putList; //从source 会先放至putList,然后commit传送到channel queue队列
  4. private final ChannelCounter channelCounter; //ChannelCounter类定义了监控指标数据的一些属性方法
  5. private int putByteCounter = 0;
  6. private int takeByteCounter = 0;
  7. //MemoryTransaction方法中初始化事务需要的两个阻塞双端队列
  8. public MemoryTransaction(int transCapacity, ChannelCounter counter) {
  9. putList = new LinkedBlockingDeque<Event>(transCapacity);
  10. takeList = new LinkedBlockingDeque<Event>(transCapacity);
  11. channelCounter = counter;
  12. }
  13. //重写父类BasicChannelSemantics中的几个事务处理方法,往putList队列中添加指定Event
  14. @Override
  15. protected void doPut(Event event) throws InterruptedException {
  16. channelCounter.incrementEventPutAttemptCount(); //将正在尝试放入channel 的event计数器原子的加一
  17. int eventByteSize = (int) Math.ceil(estimateEventSize(event) / byteCapacitySlotSize);
  18. /*
  19. * offer若立即可行且不违反容量限制,则将指定的元素插入putList阻塞双端队列中(队尾),
  20. * 并在成功时返回,如果当前没有空间可用,则返回false
  21. * */
  22. if (!putList.offer(event)) {
  23. throw new ChannelException( //队列满,抛异常
  24. "Put queue for MemoryTransaction of capacity " +
  25. putList.size() + " full, consider committing more frequently, " +
  26. "increasing capacity or increasing thread count");
  27. }
  28. putByteCounter += eventByteSize;
  29. }
  30. //从MemoryChannel的queue队列中取元素,然后放入takeList里面,作为本次事务需要提交的Event
  31. @Override
  32. protected Event doTake() throws InterruptedException {
  33. channelCounter.incrementEventTakeAttemptCount(); //将正在从channel中取出的event计数器原子的加一
  34. if (takeList.remainingCapacity() == 0) { //takeList队列剩余容量为0,抛异常
  35. throw new ChannelException("Take list for MemoryTransaction, capacity " +
  36. takeList.size() + " full, consider committing more frequently, " +
  37. "increasing capacity, or increasing thread count");
  38. }
  39. if (!queueStored.tryAcquire(keepAlive, TimeUnit.SECONDS)) {
  40. return null;
  41. }
  42. Event event;
  43. synchronized (queueLock) { //从Channel queue中take event,同一时间只能有一个线程访问,加锁同步
  44. event = queue.poll(); //获取并移除MemoryChannel双端队列表示的队列的头部(也就是队列的第一个元素),队列为空返回null
  45. }
  46. Preconditions.checkNotNull(event, "Queue.poll returned NULL despite semaphore " +
  47. "signalling existence of entry");
  48. takeList.put(event); //将取出的event放入takeList
  49. /* 计算event的byte大小 */
  50. int eventByteSize = (int) Math.ceil(estimateEventSize(event) / byteCapacitySlotSize);
  51. takeByteCounter += eventByteSize;
  52. return event;
  53. }
  54. /* 事务提交 */
  55. @Override
  56. protected void doCommit() throws InterruptedException {
  57. int remainingChange = takeList.size() - putList.size(); //takeList.size()可以看成source,putList.size()看成sink
  58. if (remainingChange < 0) { //sink的消费速度慢于source的产生速度
  59. //判断是否有足够空间接收putList中的events所占的空间
  60. if (!bytesRemaining.tryAcquire(putByteCounter, keepAlive, TimeUnit.SECONDS)) {
  61. throw new ChannelException("Cannot commit transaction. Byte capacity " +
  62. "allocated to store event body " + byteCapacity * byteCapacitySlotSize +
  63. "reached. Please increase heap space/byte capacity allocated to " +
  64. "the channel as the sinks may not be keeping up with the sources");
  65. }
  66. //因为source速度快于sink速度,需判断queue是否还有空间接收event
  67. if (!queueRemaining.tryAcquire(-remainingChange, keepAlive, TimeUnit.SECONDS)) {
  68. bytesRemaining.release(putByteCounter);
  69. throw new ChannelFullException("Space for commit to queue couldn't be acquired." +
  70. " Sinks are likely not keeping up with sources, or the buffer size is too tight");
  71. }
  72. }
  73. int puts = putList.size(); //事务期间生产的event
  74. int takes = takeList.size(); //事务期间等待消费的event
  75. synchronized (queueLock) {
  76. if (puts > 0) {
  77. while (!putList.isEmpty()) {
  78. if (!queue.offer(putList.removeFirst())) { //将新添加的events保存到queue中
  79. throw new RuntimeException("Queue add failed, this shouldn't be able to happen");
  80. }
  81. }
  82. }
  83. putList.clear(); //以上步骤执行成功,清空事务的putList和takeList
  84. takeList.clear();
  85. }
  86. bytesRemaining.release(takeByteCounter); //
  87. takeByteCounter = 0;
  88. putByteCounter = 0;
  89. queueStored.release(puts); //从queueStored释放puts个信号量
  90. if (remainingChange > 0) {
  91. queueRemaining.release(remainingChange);
  92. }
  93. if (puts > 0) { //更新成功放入Channel中的events监控指标数据
  94. channelCounter.addToEventPutSuccessCount(puts);
  95. }
  96. if (takes > 0) { //更新成功从Channel中取出的events的数量
  97. channelCounter.addToEventTakeSuccessCount(takes);
  98. }
  99. channelCounter.setChannelSize(queue.size());
  100. }
  101. //事务回滚
  102. @Override
  103. protected void doRollback() {
  104. int takes = takeList.size();
  105. synchronized (queueLock) {
  106. Preconditions.checkState(queue.remainingCapacity() >= takeList.size(),
  107. "Not enough space in memory channel " +
  108. "queue to rollback takes. This should never happen, please report");
  109. while (!takeList.isEmpty()) { //takeList不为空,将其events全部放回queue
  110. //removeLast()获取并移除此双端队列的最后一个元素
  111. queue.addFirst(takeList.removeLast());
  112. }
  113. putList.clear();
  114. }
  115. bytesRemaining.release(putByteCounter);
  116. putByteCounter = 0;
  117. takeByteCounter = 0;
  118. queueStored.release(takes);
  119. channelCounter.setChannelSize(queue.size());
  120. }
  121. }

会根据事务容量 transCapacity 创建两个阻塞双端队列putList和takeList,这两个队列主要就是用于事务处理的,当从Source往 Channel中放事件event 时,会先将event放入 putList 队列(相当于一个临时缓冲队列),然后将putList队列中的event 放入 MemoryChannel的queue中;当从 Channel 中将数据传送给 Sink 时,则会将event先放入 takeList 队列中,然后从takeList队列中将event送入Sink,不论是 put 还是 take 发生异常,都会调用 rollback 方法回滚事务,会先给 Channel 加锁防止回滚时有其他线程访问,若takeList 不为空, 就将写入 takeList中的event再次放入 Channel 中,然后移除 putList 中的所有event(即就是丢弃写入putList临时队列的 event)。 从上边代码发现这里只是具体方法的实现,实际的的调用是发生在 Source 端写事件和 Sink 读事件时,也就是事务发生时,如下代码逻辑,具体的实现可以参看前一篇博文《flume Source启动过程分析

  1. Channel ch = ...
  2. Transaction tx = ch.getTransaction();
  3. try {
  4. tx.begin();
  5. ...
  6. // ch.put(event) or ch.take() Source写事件调用put方法,Sink读事件调用take方法
  7. ...
  8. tx.commit();
  9. } catch (ChannelException ex) { // 发生异常则回滚事务
  10. tx.rollback();
  11. ...
  12. } finally {
  13. tx.close();
  14. }

MemoryChannel 第三部分就是通过configure方法获取配置文件系统,初始化MemoryChannel,其中对于配置信息的读取有两种方法,只在启动时读取一次或者动态的加载配置文件,动态读取配置文件时若修改了Channel 的容量大小,则会调用 resizeQueue 方法进行调整,如下:

  1. if (queue != null) { //queue不为null,则为动态修改配置文件时,重新指定了capacity
  2. try {
  3. resizeQueue(capacity);
  4. } catch (InterruptedException e) {
  5. Thread.currentThread().interrupt();
  6. }
  7. } else { //初始化queue,根据指定的capacity申请双向阻塞队列,并初始化信号量
  8. synchronized (queueLock) {
  9. queue = new LinkedBlockingDeque<Event>(capacity);
  10. queueRemaining = new Semaphore(capacity);
  11. queueStored = new Semaphore(0);
  12. }
  13. }

动态调整 Channel 容量主要分为三种情况:

新老容量相同,则直接返回;

老容量大于新容量,缩容,需先给未被占用的空间加锁,防止在缩容时有线程再往其写数据,然后创建新容量的队列,将原本队列加入中所有的 event 添加至新队列中;

老容量小于新容量,扩容,然后创建新容量的队列,将原本队列加入中所有的 event 添加至新队列中。

  1. private void resizeQueue(int capacity) throws InterruptedException {
  2. int oldCapacity;
  3. //计算原本的Channel Queue的容量
  4. synchronized (queueLock) {
  5. oldCapacity = queue.size() + queue.remainingCapacity();
  6. }
  7. //新容量和老容量相等,不需要调整返回
  8. if (oldCapacity == capacity) {
  9. return;
  10. } else if (oldCapacity > capacity) { //缩容
  11. //首先要预占用未被占用的容量,防止其他线程进行操作
  12. if (!queueRemaining.tryAcquire(oldCapacity - capacity, keepAlive, TimeUnit.SECONDS)) {
  13. LOGGER.warn("Couldn't acquire permits to downsize the queue, resizing has been aborted");
  14. } else {
  15. //锁定queueLock进行缩容,先创建新capacity的双端阻塞队列,然后复制老Queue数据。线程安全
  16. synchronized (queueLock) {
  17. LinkedBlockingDeque<Event> newQueue = new LinkedBlockingDeque<Event>(capacity);
  18. newQueue.addAll(queue);
  19. queue = newQueue;
  20. }
  21. }
  22. } else { //扩容,加锁,创建新newQueue,复制老queue数据
  23. synchronized (queueLock) {
  24. LinkedBlockingDeque<Event> newQueue = new LinkedBlockingDeque<Event>(capacity);
  25. newQueue.addAll(queue);
  26. queue = newQueue;
  27. }
  28. //释放capacity - oldCapacity个许可,即就是增加这么多可用许可
  29. queueRemaining.release(capacity - oldCapacity);
  30. }
  31. }

以上就是 MemoryChannel 中的大致过程,这种 Channel 是 flume中逻辑最简单的一种,更复杂的还有FileChannel、KafkaChannel等,但基本的框架和事务流程都和 MemoryChannel 类似。了解了MemoryChannel 就会更加容易去学习 FileChannel和KafkaChannel 了。

0 0
原创粉丝点击