Flume NG 学习笔记(十) Transaction、Sink、Source和Channel开发
来源:互联网 发布:库卡krc 5.6编程软件 编辑:程序博客网 时间:2024/05/21 06:26
一、Transaction interface
Transaction接口是基于flume的稳定性考虑的。所有主要的组件(sources、sinks、channels)都必须使用Flume Transaction。我们也可以理解Transaction接口就是flume的事务,sources和sinks的发送数据与接受数据都是在一个Transaction里完成的。
从上图中可以看出,一个Transaction在Channel实现内实现。每一个连接到channel的source和sink都要获取一个Transaction对象。这Sources实际上使用了一个ChannelSelector接口来封装Transaction。存放事件到channel和从channel中提取事件的操作是在一个活跃的Transaction内执行的。
下面是官网例子
Channel ch = new MemoryChannel();Transaction txn = ch.getTransaction();txn.begin();try { // This try clause includes whatever Channel operations you want to do Event eventToStage = EventBuilder.withBody("Hello Flume!", Charset.forName("UTF-8")); ch.put(eventToStage); // Event takenEvent = ch.take(); // ... txn.commit();} catch (Throwable t) { txn.rollback(); // Log exception, handle individual exceptions as needed // re-throw all Errors if (t instanceof Error) { throw (Error)t; }} finally { txn.close();}
上面的代码是一个很简单的Transaction示例,在自定义Source与自定义Sink中都要使用。
二、自定义Sink开发
Sink提取event数据从channel中,然后直接将数据发送到下一个flume agent中或者存储到外部库中。
Sink和channel的关联关系可以在配置文件中配置。有一个SinkRunner实例与每一个已配置的Sink关联,当Flume框架调用SinkRunner.start()方法时候,将创建一个新的线程来驱动这Sink。
这个线程将管理这个Sink的生命周期。Sink需要实现LifecycleAware接口的start()和stop()方法。start()方法用于初始化数据;stop()用于释放资源;process()是从channel中提取event数据和转发数据的核心方法。
这Sink需要实现Configurable接口以便操作配置文件。
下面是官网例子:
public class MySink extends AbstractSink implements Configurable { private String myProp; @Override public void configure(Context context) { String myProp = context.getString("myProp", "defaultValue"); // Process the myProp value (e.g. validation) // Store myProp for later retrieval by process() method this.myProp = myProp; } @Override public void start() { // Initialize the connection to the external repository (e.g. HDFS) that // this Sink will forward Events to .. } @Override public void stop () { // Disconnect from the external respository and do any // additional cleanup (e.g. releasing resources or nulling-out // field values) .. } @Override public Status process() throws EventDeliveryException { Status status = null; // Start transaction Channel ch = getChannel(); Transaction txn = ch.getTransaction(); txn.begin(); try { // This try clause includes whatever Channel operations you want to do Event event = ch.take(); // Send the Event to the external repository. // storeSomeData(e); txn.commit(); status = Status.READY; } catch (Throwable t) { txn.rollback(); // Log exception, handle individual exceptions as needed status = Status.BACKOFF; // re-throw all Errors if (t instanceof Error) { throw (Error)t; } } finally { txn.close(); } return status; }}下面是测试例子:
import org.apache.flume.Channel;import org.apache.flume.Context;import org.apache.flume.Event;import org.apache.flume.EventDeliveryException;import org.apache.flume.Transaction;import org.apache.flume.conf.Configurable;import org.apache.flume.sink.AbstractSink;public class Custom_Sink extends AbstractSink implements Configurable { private String myProp; @Override public void configure(Context context) { String myProp = context.getString("myProp", "defaultValue"); // Process the myProp value (e.g. validation) // Store myProp for later retrieval by process() method this.myProp = myProp; } @Override public void start() { // Initialize the connection to the external repository (e.g. HDFS) that // this Sink will forward Events to .. } @Override public void stop () { // Disconnect from the external respository and do any // additional cleanup (e.g. releasing resources or nulling-out // field values) .. } @Override public Status process() throws EventDeliveryException { Status status = null; // Start transaction Channel ch = getChannel(); Transaction txn = ch.getTransaction(); txn.begin(); try { // This try clause includes whatever Channel operations you want to do Event event = ch.take(); String out = new String(event.getBody()); // Send the Event to the external repository. // storeSomeData(e); System.out.println(out); txn.commit(); status = Status.READY; } catch (Throwable t) { txn.rollback(); // Log exception, handle individual exceptions as needed status = Status.BACKOFF; // re-throw all Errors if (t instanceof Error) { throw (Error)t; } } finally { txn.close(); } return status; }}
上面的测试例子只输出事件的BODY信息,这里说明下直接用代码event.getBody().tostring() 输出是乱码。因为所有sink都是在Transaction里完成的,因此自定义开发sink是需要加上Transaction相关设置。
然后是测试配置,这里是自定义的jar 包是flumedev.Custom_Sink。注意,打包之后请放在目录$FLUME_HOME/lib下
#配置文件:custom_sink_case23.conf# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = syslogtcpa1.sources.r1.port = 50000a1.sources.r1.bind = 192.168.233.128a1.sources.r1.channels = c1# Describe the sinka1.sinks.k1.channel = c1a1.sinks.k1.type = flumedev.Custom_Sink#a1.sinks.k1.type =logger# Use a channel which buffers events in memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100
#敲命令
flume-ng agent -cconf -f conf/custom_sink_case23.conf -n a1 -Dflume.root.logger=INFO,console
启动成功后
打开另一个终端输入,往侦听端口送数据
echo "testcustom_sink" | nc 192.168.233.128 50000
#在启动的终端查看console输出
可以看到数据正常输出。
三、自定义Source开发
Source从外面接收数据并把数据存入Channel中。很少有人用。
下面是官网的例子
public class MySource extends AbstractSource implements Configurable, PollableSource { private String myProp; @Override public void configure(Context context) { String myProp = context.getString("myProp", "defaultValue"); // Process the myProp value (e.g. validation, convert to another type, ...) // Store myProp for later retrieval by process() method this.myProp = myProp; } @Override public void start() { // Initialize the connection to the external client } @Override public void stop () { // Disconnect from external client and do any additional cleanup // (e.g. releasing resources or nulling-out field values) .. } @Override public Status process() throws EventDeliveryException { Status status = null; // Start transaction Channel ch = getChannel(); Transaction txn = ch.getTransaction(); txn.begin(); try { // This try clause includes whatever Channel operations you want to do // Receive new data Event e = getSomeData(); // Store the Event into this Source's associated Channel(s) getChannelProcessor().processEvent(e) txn.commit(); status = Status.READY; } catch (Throwable t) { txn.rollback(); // Log exception, handle individual exceptions as needed status = Status.BACKOFF; // re-throw all Errors if (t instanceof Error) { throw (Error)t; } } finally { txn.close(); } return status; }}
测试的话,主要针对Event e 这里进行传输数据,这里就不测试了。
四、自定义Channel开发
官网说待定。
下面是美团网的自定义Channel 开发,下面是链接
http://tech.meituan.com/mt-log-system-optimization.html
……
Flume本身提供了MemoryChannel和FileChannel。MemoryChannel处理速度快,但缓存大小有限,且没有持久化;FileChannel则刚好相反。我们希望利用两者的优势,在Sink处理速度够快,Channel没有缓存过多日志的时候,就使用MemoryChannel,当Sink处理速度跟不上,又需要Channel能够缓存下应用端发送过来的日志时,就使用FileChannel,由此我们开发了DualChannel,能够智能的在两个Channel之间切换。
其具体的逻辑如下:
/*** * putToMemChannel indicate put event to memChannel or fileChannel * takeFromMemChannel indicate take event from memChannel or fileChannel * */private AtomicBoolean putToMemChannel = new AtomicBoolean(true);private AtomicBoolean takeFromMemChannel = new AtomicBoolean(true);void doPut(Event event) { if (switchon && putToMemChannel.get()) { //往memChannel中写数据 memTransaction.put(event); if ( memChannel.isFull() || fileChannel.getQueueSize() > 100) { putToMemChannel.set(false); } } else { //往fileChannel中写数据 fileTransaction.put(event); } }Event doTake() { Event event = null; if ( takeFromMemChannel.get() ) { //从memChannel中取数据 event = memTransaction.take(); if (event == null) { takeFromMemChannel.set(false); } } else { //从fileChannel中取数据 event = fileTransaction.take(); if (event == null) { takeFromMemChannel.set(true); putToMemChannel.set(true); } } return event;}
这里要说明下,官网是建议使用file channel,虽然它的效率比较低,但是它能保证数据完整性,而memory channel效率高,但是只能对数据丢失和重复不太敏感的业务使用
- Flume NG 学习笔记(十) Transaction、Sink、Source和Channel开发
- Flume学习笔记 --- Flume内置source,channel, sink介绍
- Flume笔记二之source,channel,sink
- Flume NG 学习笔记(五)Sinks和Channel配置
- flume之source,channel,sink
- 【Flume】【源码分析】深入flume-ng的三大组件——source,channel,sink
- 【Flume】【源码分析】深入flume-ng的三大组件——source,channel,sink
- 【Flume】【源码分析】深入flume-ng的三大组件——source,channel,sink
- flume 的source 、channel和sink 多种组合
- flume 的source 、channel和sink 多种组合
- flume的source, channel, sink 列表
- flume组件汇总 (source, channel, sink)
- flume的source, channel, sink 列表
- Flume组件汇总 source、sink、channel
- flume中几种常见的source、channel、sink
- flume 自定义source,sink,channel,拦截器
- Flume学习进阶(一):source、channel、sink、processor、interceptor等组件列表说明及包含的类型介绍
- Flume NG 学习笔记(四)Source配置
- Oracle 锁的原因及解决办法
- bash中的特殊符号
- Windows下网络负荷率、Cpu占用率的实时监视
- 在多核计算机上使用OpenMP给程序加速
- 实实在在说多态(C++篇)
- Flume NG 学习笔记(十) Transaction、Sink、Source和Channel开发
- Android 中自定义View的应用(三)
- Cracking the coding interview--Q3.2
- 教你解决 android软键盘挡住输入框问题
- 服务器硬件配置清单
- 线段树区间更新——POJ 3225
- ubuntu 14.04 安装jdk+eclipse+android sdk +adt
- 第10周项目4-大奖赛计分
- VC数字图像处理编程之一