均衡负载方式搭建高可用的flume-ng环境写入信息到hadoop和kafka
来源:互联网 发布:剪裁照片的软件 编辑:程序博客网 时间:2024/05/16 10:34
应用场景为多台agent推送本地日志信息到hadoop,由于agent和hadoop集群处在不同的网段,数据量较大时可能出现网络压力较大的情况,所以我们在hadoop一侧的网段中部署了两台flume collector机器,将agent的数据发送到collector上进行分流,分成2个collector的数据导入hadoop,数据流图如下:
图中只画了3个agent,实际应用场景中有多台,但是collector只有两台
我们需要将agent的数据均衡地分发到两台collector机器上,agent的配置如下:
#name the components on this agent 这里声明各个source、channel、sink的名称a1.sources = r1a1.sinks = k1 k2a1.channels = c1# Describe/configure the source 声明source的类型,此处是通过tcp的方式监听本地端口5140a1.sources.r1.type = syslogtcpa1.sources.r1.port = 5140a1.sources.r1.host = localhosta1.sources.r1.channels = c1#define sinkgroups 此处配置k1、k2的组策略,k1、k2合为一组,类型为均衡负载方式a1.sinkgroups=g1a1.sinkgroups.g1.sinks=k1 k2a1.sinkgroups.g1.processor.type=load_balancea1.sinkgroups.g1.processor.backoff=truea1.sinkgroups.g1.processor.selector=round_robin#define the sink 1<span></span>指定sink1、sink2的数据流向,都是通过avro方式发到两台collector机器a1.sinks.k1.type=avroa1.sinks.k1.hostname=10.0.3.82a1.sinks.k1.port=5150#define the sink 2a1.sinks.k2.type=avroa1.sinks.k2.hostname=10.0.3.83a1.sinks.k2.port=5150# Use a channel which buffers events in memory 指定channel的类型为内存channela1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c1a1.sinks.k2.channel=c1在collector1、collector2都正常的情况下,agent的数据随机向两台机器分发,当collector任意一台机器故障时,agent的数据会发送到另一台正常的机器上
collector1的配置
# The configuration file needs to define the sources,# the channels and the sinks.# Sources, channels and sinks are defined per agent,# in this case called 'agent'collector1.sources = r1collector1.channels = c1 c2collector1.sinks = k1 k2# Describe the sourcecollector1.sources.r1.type = avrocollector1.sources.r1.port = 5150collector1.sources.r1.bind = 0.0.0.0collector1.sources.r1.channels = c1 c2# Describe channels c1 c2 which buffers events in memorycollector1.channels.c1.type = filecollector1.channels.c1.checkpointDir = /usr/local/apache-flume-1.6.0-bin/fileChannel/checkpointcollector1.channels.c1.dataDir = /usr/local/apache-flume-1.6.0-bin/fileChannel/datacollector1.channels.c2.type = memorycollector1.channels.c2.capacity = 1000collector1.channels.c2.transactionCapacity = 100# Describe the sink k1 to hadoopcollector1.sinks.k1.type = hdfscollector1.sinks.k1.channel = c1collector1.sinks.k1.hdfs.path = /quantone/flume/collector1.sinks.k1.hdfs.fileType = DataStreamcollector1.sinks.k1.hdfs.writeFormat = TEXTcollector1.sinks.k1.hdfs.rollInterval = 300collector1.sinks.k1.hdfs.filePrefix = %Y-%m-%dcollector1.sinks.k1.hdfs.round = truecollector1.sinks.k1.hdfs.roundValue = 5collector1.sinks.k1.hdfs.roundUnit = minutecollector1.sinks.k1.hdfs.useLocalTimeStamp = true# Describe the sink k2 to kafkacollector1.sinks.k2.type = org.apache.flume.sink.kafka.KafkaSinkcollector1.sinks.k2.topic = mytopiccollector1.sinks.k2.channel = c2collector1.sinks.k2.brokerList = 10.0.3.178:9092,10.0.3.179:9092collector1.sinks.k2.requiredAcks = 1collector1.sinks.k2.batchSize = 20
collector2的配置
# The configuration file needs to define the sources,# the channels and the sinks.# Sources, channels and sinks are defined per agent,# in this case called 'agent'collector2.sources = r1collector2.channels = c1 c2collector2.sinks = k1 k2# Describe the sourcecollector2.sources.r1.type = avrocollector2.sources.r1.port = 5150collector2.sources.r1.bind = 0.0.0.0collector2.sources.r1.channels = c1 c2# Describe channels c1 c2 which buffers events in memorycollector2.channels.c1.type = filecollector2.channels.c1.checkpointDir = /usr/local/apache-flume-1.6.0-bin/fileChannel/checkpointcollector2.channels.c1.dataDir = /usr/local/apache-flume-1.6.0-bin/fileChannel/datacollector2.channels.c2.type = memorycollector2.channels.c2.capacity = 1000collector2.channels.c2.transactionCapacity = 100# Describe the sink k1 to hadoopcollector2.sinks.k1.type = hdfscollector2.sinks.k1.channel = c1collector2.sinks.k1.hdfs.path = /quantone/flumecollector2.sinks.k1.hdfs.fileType = DataStreamcollector2.sinks.k1.hdfs.writeFormat = TEXTcollector2.sinks.k1.hdfs.rollInterval = 300collector2.sinks.k1.hdfs.filePrefix = %Y-%m-%dcollector2.sinks.k1.hdfs.round = truecollector2.sinks.k1.hdfs.roundValue = 5collector2.sinks.k1.hdfs.roundUnit = minutecollector2.sinks.k1.hdfs.useLocalTimeStamp = true# Describe the sink k2 to kafkacollector2.sinks.k2.type = org.apache.flume.sink.kafka.KafkaSinkcollector2.sinks.k2.topic = mytopiccollector2.sinks.k2.channel = c2collector2.sinks.k2.brokerList = 10.0.3.178:9092,10.0.3.179:9092collector2.sinks.k2.requiredAcks = 1collector2.sinks.k2.batchSize = 20
sink到hadoop的channel类型为file类型,该类型的channel会在对应的sink发送数据失败后将信息持久化到对应的文件目录中,待网络恢复正常后继续讲数据发送出去,相比memory channel,此种类型的channel适合数据量不大但是对可靠性要求较高的数据传输。
需要注意的是:此处我们使用collector2.sinks.k1.hdfs.filePrefix = %Y-%m-%d 的配置标明写入hadoop中文件名的前缀,如果在发送数据的header中没有对应的timestamp字段,这样配置会导致数据发送不了,此时需要加上配置collector2.sinks.k1.hdfs.useLocalTimeStamp = true 表明使用collector此时的时间来匹配%Y-%m-%d字段,但是这个时间其实不是日志在agent本地生成的真实时间。
如果想让不同的agent的数据写入到不同的kafka 的topic中,在collector的kafka sink中的字段collector1.sinks.k2.topic = mytopic 配置可以不配,在每个agent的source中配置static类型的interceptors,如:
a1.sources.r1.interceptors.i1.type = statica1.sources.r1.interceptors.i1.key = topica1.sources.r1.interceptors.i1.value = mytopic这样可以使不同的agent生成不同的topic名,将不同agent的数据写入到对应的topic中
- 均衡负载方式搭建高可用的flume-ng环境写入信息到hadoop和kafka
- 详解scribe+flume搭建高可用的负载均衡日志收集系统入hadoop和kafka
- 【Flume】 flume 负载均衡环境的搭建 load_balance
- Flume 高可用 负载均衡问题
- 搭建mysql负载均衡及高可用环境
- 搭建mysql负载均衡及高可用环境
- 搭建mysql负载均衡及高可用环境
- 搭建mysql负载均衡及高可用环境
- 搭建MySQL负载均衡及高可用环境
- flume-ng负载均衡load-balance、failover集群搭建
- flume-ng负载均衡load-balance、failover集群搭建
- flume-ng负载均衡load-balance、failover集群搭建
- flume-ng负载均衡load-balance、failover集群搭建
- flume-ng负载均衡load-balance、failover集群搭建
- 配置篇------lvs+keepalive搭建高可用的负载均衡
- Codis 高可用负载均衡群集的搭建与使用
- Codis 高可用负载均衡群集的搭建与使用
- 用keepalived+lvs_dr搭建高可用的负载均衡集群
- java版利用opencv根据RotateRect裁剪图像区域算法
- 前端
- Base的
- iOS开发之千呼万唤始出来iOS10更新内容以及iOS 10的闪退问题解决
- Android开发 date工具类
- 均衡负载方式搭建高可用的flume-ng环境写入信息到hadoop和kafka
- JAVA矩阵包JAMA学习
- iOS_Pass iOS10获取相册权限崩溃crash解决方法
- python2和python3差异对比
- Could not delete F:/workspace/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps/platfor
- 文件存储与数据返回错误总结
- JavaScript----增加删除修改
- angularjs promise理解与使用
- STL————-unique算法