flume MemoryChannel分析
来源:互联网 发布:金融数据分析知乎 编辑:程序博客网 时间:2024/05/14 10:44
前边介绍了flume的基本概念和Source部分,接下来看看flume中的第二大组件Channel中的MemoryChannel。MemoryChannel是完全在内存中运行,速度很快,其优点同样也就成了缺点,不能持久化,若机器发生宕机或断电,数据就会丢失。在实际使用中需要根据具体的需求进行合理的选择。
先看下MemoryChannel的基本的类图,根据这个结构图可以很好的帮助理解。
MemoryChannel中最重要的部分主要是Channel、Transaction 和Configurable三个接口。
Channel接口中主要声明了Channel中的三个方法:
public void put(Event event) throws ChannelException; //刚方法从指定的Source中获得Event放入指定的Channel中
public Event take() throws ChannelException; //take方法主要是从Channel中取出event放入Sink中
public Transaction getTransaction(); //getTransaction方法是获得当前Channel的事务实例
Transaction接口主要声明了flume中事务机制的四个方法:
enum TransactionState { Started, Committed, RolledBack, Closed } //枚举类型,指定了事务的四种状态,事务开始、提交、失败回滚、关闭
void begin();
void commit();
void rollback();
void close();
Configurable接口主要是和flume配置组件相关的,需要从flume配置系统获取配置信息的任何组件,都必须实现该接口。该接口中只声明了一个context方法,用于获取配置信息。
以上方法的具体内容都是在具体的Channel中实现的,系统在启动时会根据配置文件信息调用相应的组件的方法实现,这种实现称为回调,类似于C语言中的钩子函数,先声明方法然后在具体的需要时调用相应的实现方法。
接下来看看具体的代码实现,代码开始是定义一些默认的配置信息,Channel、Transaction事务大小等信息:
public class MemoryChannel extends BasicChannelSemantics {
private static Logger LOGGER = LoggerFactory.getLogger(MemoryChannel.class);
private static final Integer defaultCapacity = 100;
private static final Integer defaultTransCapacity = 100;
private static final double byteCapacitySlotSize = 100;
private static final Long defaultByteCapacity = (long)(Runtime.getRuntime().maxMemory() * .80);
private static final Integer defaultByteCapacityBufferPercentage = 20;
private static final Integer defaultKeepAlive = 3;
接下来就是实现Channel的put、get方法和事务的commit、rollback方法,这几个方法都是在内部类 MemoryTransaction 实现的,看到下边的几个方法名字,大家可能会问怎么每个方法前边都有个do,实际上这是因为 MemoryTransaction 继承了BasicTransactionSemantics抽象类,而不是直接实现了 Channel 和 Transaction 接口,在 BasicTransactionSemantics抽象接口中 对上边提到的几种方法做了一些简单的封装,在内部调用就是调用类似doTake的方法:
protected Event take() {
Preconditions.checkState(Thread.currentThread().getId() == initialThreadId,
"take() called from different thread than getTransaction()!");
Preconditions.checkState(state.equals(State.OPEN),
"take() called when transaction is %s!", state);
try {
return doTake();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return null;
}
}
具体的 MemoryTransaction 内部类实现的几个方法如下代码:
private class MemoryTransaction extends BasicTransactionSemantics {
private LinkedBlockingDeque<Event> takeList; //阻塞双端队列,从channel中取event先放入takeList,输送到sink,commit成功,从channel queue中删除
private LinkedBlockingDeque<Event> putList; //从source 会先放至putList,然后commit传送到channel queue队列
private final ChannelCounter channelCounter; //ChannelCounter类定义了监控指标数据的一些属性方法
private int putByteCounter = 0;
private int takeByteCounter = 0;
//MemoryTransaction方法中初始化事务需要的两个阻塞双端队列
public MemoryTransaction(int transCapacity, ChannelCounter counter) {
putList = new LinkedBlockingDeque<Event>(transCapacity);
takeList = new LinkedBlockingDeque<Event>(transCapacity);
channelCounter = counter;
}
//重写父类BasicChannelSemantics中的几个事务处理方法,往putList队列中添加指定Event
@Override
protected void doPut(Event event) throws InterruptedException {
channelCounter.incrementEventPutAttemptCount(); //将正在尝试放入channel 的event计数器原子的加一
int eventByteSize = (int) Math.ceil(estimateEventSize(event) / byteCapacitySlotSize);
/*
* offer若立即可行且不违反容量限制,则将指定的元素插入putList阻塞双端队列中(队尾),
* 并在成功时返回,如果当前没有空间可用,则返回false
* */
if (!putList.offer(event)) {
throw new ChannelException( //队列满,抛异常
"Put queue for MemoryTransaction of capacity " +
putList.size() + " full, consider committing more frequently, " +
"increasing capacity or increasing thread count");
}
putByteCounter += eventByteSize;
}
//从MemoryChannel的queue队列中取元素,然后放入takeList里面,作为本次事务需要提交的Event
@Override
protected Event doTake() throws InterruptedException {
channelCounter.incrementEventTakeAttemptCount(); //将正在从channel中取出的event计数器原子的加一
if (takeList.remainingCapacity() == 0) { //takeList队列剩余容量为0,抛异常
throw new ChannelException("Take list for MemoryTransaction, capacity " +
takeList.size() + " full, consider committing more frequently, " +
"increasing capacity, or increasing thread count");
}
if (!queueStored.tryAcquire(keepAlive, TimeUnit.SECONDS)) {
return null;
}
Event event;
synchronized (queueLock) { //从Channel queue中take event,同一时间只能有一个线程访问,加锁同步
event = queue.poll(); //获取并移除MemoryChannel双端队列表示的队列的头部(也就是队列的第一个元素),队列为空返回null
}
Preconditions.checkNotNull(event, "Queue.poll returned NULL despite semaphore " +
"signalling existence of entry");
takeList.put(event); //将取出的event放入takeList
/* 计算event的byte大小 */
int eventByteSize = (int) Math.ceil(estimateEventSize(event) / byteCapacitySlotSize);
takeByteCounter += eventByteSize;
return event;
}
/* 事务提交 */
@Override
protected void doCommit() throws InterruptedException {
int remainingChange = takeList.size() - putList.size(); //takeList.size()可以看成source,putList.size()看成sink
if (remainingChange < 0) { //sink的消费速度慢于source的产生速度
//判断是否有足够空间接收putList中的events所占的空间
if (!bytesRemaining.tryAcquire(putByteCounter, keepAlive, TimeUnit.SECONDS)) {
throw new ChannelException("Cannot commit transaction. Byte capacity " +
"allocated to store event body " + byteCapacity * byteCapacitySlotSize +
"reached. Please increase heap space/byte capacity allocated to " +
"the channel as the sinks may not be keeping up with the sources");
}
//因为source速度快于sink速度,需判断queue是否还有空间接收event
if (!queueRemaining.tryAcquire(-remainingChange, keepAlive, TimeUnit.SECONDS)) {
bytesRemaining.release(putByteCounter);
throw new ChannelFullException("Space for commit to queue couldn't be acquired." +
" Sinks are likely not keeping up with sources, or the buffer size is too tight");
}
}
int puts = putList.size(); //事务期间生产的event
int takes = takeList.size(); //事务期间等待消费的event
synchronized (queueLock) {
if (puts > 0) {
while (!putList.isEmpty()) {
if (!queue.offer(putList.removeFirst())) { //将新添加的events保存到queue中
throw new RuntimeException("Queue add failed, this shouldn't be able to happen");
}
}
}
putList.clear(); //以上步骤执行成功,清空事务的putList和takeList
takeList.clear();
}
bytesRemaining.release(takeByteCounter); //
takeByteCounter = 0;
putByteCounter = 0;
queueStored.release(puts); //从queueStored释放puts个信号量
if (remainingChange > 0) {
queueRemaining.release(remainingChange);
}
if (puts > 0) { //更新成功放入Channel中的events监控指标数据
channelCounter.addToEventPutSuccessCount(puts);
}
if (takes > 0) { //更新成功从Channel中取出的events的数量
channelCounter.addToEventTakeSuccessCount(takes);
}
channelCounter.setChannelSize(queue.size());
}
//事务回滚
@Override
protected void doRollback() {
int takes = takeList.size();
synchronized (queueLock) {
Preconditions.checkState(queue.remainingCapacity() >= takeList.size(),
"Not enough space in memory channel " +
"queue to rollback takes. This should never happen, please report");
while (!takeList.isEmpty()) { //takeList不为空,将其events全部放回queue
//removeLast()获取并移除此双端队列的最后一个元素
queue.addFirst(takeList.removeLast());
}
putList.clear();
}
bytesRemaining.release(putByteCounter);
putByteCounter = 0;
takeByteCounter = 0;
queueStored.release(takes);
channelCounter.setChannelSize(queue.size());
}
}
会根据事务容量 transCapacity 创建两个阻塞双端队列putList和takeList,这两个队列主要就是用于事务处理的,当从Source往 Channel中放事件event 时,会先将event放入 putList 队列(相当于一个临时缓冲队列),然后将putList队列中的event 放入 MemoryChannel的queue中;当从 Channel 中将数据传送给 Sink 时,则会将event先放入 takeList 队列中,然后从takeList队列中将event送入Sink,不论是 put 还是 take 发生异常,都会调用 rollback 方法回滚事务,会先给 Channel 加锁防止回滚时有其他线程访问,若takeList 不为空, 就将写入 takeList中的event再次放入 Channel 中,然后移除 putList 中的所有event(即就是丢弃写入putList临时队列的 event)。 从上边代码发现这里只是具体方法的实现,实际的的调用是发生在 Source 端写事件和 Sink 读事件时,也就是事务发生时,如下代码逻辑,具体的实现可以参看前一篇博文《flume Source启动过程分析》
Channel ch = ...
Transaction tx = ch.getTransaction();
try {
tx.begin();
...
// ch.put(event) or ch.take() Source写事件调用put方法,Sink读事件调用take方法
...
tx.commit();
} catch (ChannelException ex) { // 发生异常则回滚事务
tx.rollback();
...
} finally {
tx.close();
}
MemoryChannel 第三部分就是通过configure方法获取配置文件系统,初始化MemoryChannel,其中对于配置信息的读取有两种方法,只在启动时读取一次或者动态的加载配置文件,动态读取配置文件时若修改了Channel 的容量大小,则会调用 resizeQueue 方法进行调整,如下:
if (queue != null) { //queue不为null,则为动态修改配置文件时,重新指定了capacity
try {
resizeQueue(capacity);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
} else { //初始化queue,根据指定的capacity申请双向阻塞队列,并初始化信号量
synchronized (queueLock) {
queue = new LinkedBlockingDeque<Event>(capacity);
queueRemaining = new Semaphore(capacity);
queueStored = new Semaphore(0);
}
}
动态调整 Channel 容量主要分为三种情况:
新老容量相同,则直接返回;
老容量大于新容量,缩容,需先给未被占用的空间加锁,防止在缩容时有线程再往其写数据,然后创建新容量的队列,将原本队列加入中所有的 event 添加至新队列中;
老容量小于新容量,扩容,然后创建新容量的队列,将原本队列加入中所有的 event 添加至新队列中。
private void resizeQueue(int capacity) throws InterruptedException {
int oldCapacity;
//计算原本的Channel Queue的容量
synchronized (queueLock) {
oldCapacity = queue.size() + queue.remainingCapacity();
}
//新容量和老容量相等,不需要调整返回
if (oldCapacity == capacity) {
return;
} else if (oldCapacity > capacity) { //缩容
//首先要预占用未被占用的容量,防止其他线程进行操作
if (!queueRemaining.tryAcquire(oldCapacity - capacity, keepAlive, TimeUnit.SECONDS)) {
LOGGER.warn("Couldn't acquire permits to downsize the queue, resizing has been aborted");
} else {
//锁定queueLock进行缩容,先创建新capacity的双端阻塞队列,然后复制老Queue数据。线程安全
synchronized (queueLock) {
LinkedBlockingDeque<Event> newQueue = new LinkedBlockingDeque<Event>(capacity);
newQueue.addAll(queue);
queue = newQueue;
}
}
} else { //扩容,加锁,创建新newQueue,复制老queue数据
synchronized (queueLock) {
LinkedBlockingDeque<Event> newQueue = new LinkedBlockingDeque<Event>(capacity);
newQueue.addAll(queue);
queue = newQueue;
}
//释放capacity - oldCapacity个许可,即就是增加这么多可用许可
queueRemaining.release(capacity - oldCapacity);
}
}
以上就是 MemoryChannel 中的大致过程,这种 Channel 是 flume中逻辑最简单的一种,更复杂的还有FileChannel、KafkaChannel等,但基本的框架和事务流程都和 MemoryChannel 类似。了解了MemoryChannel 就会更加容易去学习 FileChannel和KafkaChannel 了。
- flume MemoryChannel分析
- Flume-ng MemoryChannel 源码分析
- Flume - MemoryChannel源码解析
- flume-TailSource分析
- flume-TailDirSource分析
- Flume数据传输事务分析
- Flume结构简要分析
- flume源码分析
- flume源码分析
- flume监控分析
- flume之hdfsSink分析
- Flume数据传输事务分析
- flume源码分析
- Flume NG flume-hdfs-sink 源代码分析
- flume源码分析二:flume执行入口
- flume【源码分析】分析Flume的拦截器
- flume【源码分析】分析Flume的拦截器
- Flume-ng ThriftSource原理分析
- 全国项目管理软考(免费)在线说明会
- poj 2115 (扩展欧几里得,二元不等式求解)
- Elasticsearch 5 Ik+pinyin分词配置详解
- 【软件测试】负载测试与压力测试的区别
- json/bean/list转换
- flume MemoryChannel分析
- iOS xib和代码关联使用
- handlebars.js 注册if扩展
- JavaScript简介
- HDU 1025
- 查看LINUX系统版本是32位还是64位
- android webview显示网页空白问题
- 关于ReactNative环境搭建中的坑
- bzoj 1061