试用flume-ng 1.x
来源:互联网 发布:英语词汇书籍推荐知乎 编辑:程序博客网 时间:2024/05/18 17:42
Flume NG 1.x 是Flume 0.9.x的重构版本,基本面目全非了,Master和zookeeper没有了,collector没有了,Web console没有了,只有
- source (avro:很简单使用;exec:使用shell命令)
- sink (我用的hdfs)
- channl
这3个组件,俨然从一个分布式系统变成了传输工具。新的架构如下:
下面是一个例子(参数经过优化),使用avro作为source,hdfs作为sink,memory作为channel
# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 100000
agent1.channels.ch1.transactionCapacity = 100000
agent1.channels.ch1.keep-alive = 30
# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0.0.0
agent1.sources.avro-source1.port = 41414
agent1.sources.avro-source1.threads = 5
# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = hdfs
agent1.sinks.log-sink1.hdfs.path = hdfs://CNC-XX-R-541:9000/flume/
agent1.sinks.log-sink1.hdfs.writeFormat = Text
agent1.sinks.log-sink1.hdfs.fileType = DataStream
agent1.sinks.log-sink1.hdfs.rollInterval = 0
agent1.sinks.log-sink1.hdfs.rollSize = 60554432
agent1.sinks.log-sink1.hdfs.rollCount = 0
agent1.sinks.log-sink1.hdfs.batchSize = 1000
agent1.sinks.log-sink1.hdfs.txnEventMax = 1000
agent1.sinks.log-sink1.hdfs.callTimeout = 60000
agent1.sinks.log-sink1.hdfs.appendTimeout = 60000
# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 100000
agent1.channels.ch1.transactionCapacity = 100000
agent1.channels.ch1.keep-alive = 30
# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0.0.0
agent1.sources.avro-source1.port = 41414
agent1.sources.avro-source1.threads = 5
# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = hdfs
agent1.sinks.log-sink1.hdfs.path = hdfs://CNC-XX-R-541:9000/flume/
agent1.sinks.log-sink1.hdfs.writeFormat = Text
agent1.sinks.log-sink1.hdfs.fileType = DataStream
agent1.sinks.log-sink1.hdfs.rollInterval = 0
agent1.sinks.log-sink1.hdfs.rollSize = 60554432
agent1.sinks.log-sink1.hdfs.rollCount = 0
agent1.sinks.log-sink1.hdfs.batchSize = 1000
agent1.sinks.log-sink1.hdfs.txnEventMax = 1000
agent1.sinks.log-sink1.hdfs.callTimeout = 60000
agent1.sinks.log-sink1.hdfs.appendTimeout = 60000
# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1
启动avro agent,参数“ageng1”在上述配置文件定义。
flume-ng agent --conf ./conf/ -f conf/flume.conf -n agent1
client端上传文件
flume-ng avro-client --conf conf -H localhost -p 41414 -F /data/xx.txt
心得:
- 感觉就是一个传输工具,使用配置很简单,但是参数还得调,要不报错。
- 上传文件可以自动按照大小、行或者处理时间分割为多个文件。
- 测试了单机上传700MB文件,上传速度和hadoop fs -put速度相同,记录没有丢失。
纠结:
- 输出文件只能在flume.conf定义吗?
- 如何将文件按照规则汇总到HDFS(比如按照客户合并输出)?
- 基于事务的传输保证每个事务内Event(日志行)有保证传到HDFS,但是如果传输整个文件中途中断,还是会有部分数据上传遗留到HDFS。
结论:FlumeNG1.1,尚不能满足复杂业务要求,尽管支持自定义source和sink等组件,系统使用过程中觉得软件不够坚固,非常简单的示例,都报错,让人心里没底,还好源码够简单,这样看来只适合做简单传输。
错误记录:
org.apache.flume.ChannelException: Space for commit to queue couldn't be acquired Sinks are likely not keeping up with sources, or the buffer size is too tight
解决:设置agent1.channels.<channel_name>.keep-alive = 30
资料:
FlumeNG 架构
https://blogs.apache.org/flume/entry/flume_ng_architecture
Flume User Guide
https://people.apache.org/~mpercy/flume/flume-1.2.0-incubating-SNAPSHOT/docs/FlumeUserGuide.html#data-ingestion
转载http://heipark.iteye.com/blog/1617995
- 试用flume-ng 1.x
- flume-ng
- Flume NG
- flume ng
- Flume-ng
- Flume OG & Flume NG
- flume OG VS flume NG
- FLume NG 开发环境
- Flume NG configuration sample
- Flume NG 配置详解
- flume-ng 使用系列
- Flume-ng使用指南
- flume ng编译问题解决
- flume-ng-extends
- Flume-ng配置
- Flume NG 配置
- flume-ng 整体介绍
- Flume-ng 监控介绍
- ls 查看 目录下文件数目
- 办公室笔记本有线连接上网,利用自带的无线网卡共享网络使iPhone上网
- 在服务器上排除问题的头五分钟
- 【C陷阱和缺陷】预处理器
- 把vc中的类封装成dll的简单方法
- 试用flume-ng 1.x
- Moq - The simplest mocking library for .NET and Silverlight
- scrapy 使用代理
- Objective-C语法之类和对象
- 用Js的eval解析JSON中的注意点
- Maven 常用命令
- 【C陷阱和缺陷】可移植性缺陷
- android3——contentProvider——contentResolver
- mybits-springmvc学习笔记