详解scribe+flume搭建高可用的负载均衡日志收集系统入hadoop和kafka

来源:互联网 发布:学外语的软件 编辑:程序博客网 时间:2024/05/21 12:39

一、系统架构

为增强系统的可靠性,flume系统分为agent层和collector层

agent层是每个每个需要收集日志的主机,有若干台,可自由扩展;每台agent机器运行一个carpenter程序将相应目录下的日志信息发送给本机上的flume source,对应avro sink将数据推送到两台collector(采用均衡负载的方式推送,若其中一台collector故障则全部推送给另一台)

collector层的两台机器收到数据后,按照分配策略把数据发送给hdfs和kafka:


 

二、安装flume

1、 所有运行flume的机器需安装java环境,agent层机器安装scribe,collector层机器安装并配置好hadoop、kafka环境(安装方式见相应文档)

 

2、Flume压缩包直接解压即可使用

tar-zxvf apache-flume-1.6.0-bin.tar.gz  -C/usr/local


3、用flume-ng-kafka-sink-1.6.0.jar替换flume主目录lib文件夹内的flume-ng-kafka-sink-1.6.0.jar

 

4、配置系统环境变量:

vi/etc/profile

添加:

exportFLUME_HOME=/usr/local/apache-flume-1.6.0-binexportPATH=$PATH:$FLUME_HOME/bin 


执行:

sorce/etc/profile 

 

5、配置flume主目录conf下的flume-env.sh

添加:

exportJAVA_HOME=/usr/java/jdk1.7.0_79

 

6、运行:

flume-ngversion

出现:

Flume1.6.0Sourcecode repository: https://git-wip-us.apache.org/repos/asf/flume.gitRevision:2561a23240a71ba20bf288c7c2cda88f443c2080Compiledby hshreedharan on Mon May 11 11:15:44 PDT 2015Fromsource with checksum b29e416802ce9ece3269d34233baf43f

表示flume安装成功

 

7、在collector机器上创建file channel的数据持久化目录,目录名以配置文件中file channel 指定的两个目录为标准

collector.channels.c1.checkpointDir=/usr/local/apache-flume-1.6.0-bin/fileChannel/checkpointcollector.channels.c1.dataDir =/usr/local/apache-flume-1.6.0-bin/fileChannel/data

 

三、Flume配置

配置文件存放在flume主目录conf文件夹下

agent配置:

#声明各个节点agent.sources= r1agent.sinks= k1 k2agent.channels= c1# 配置agent的source,由于需要兼容scribe,此处source类型配置为ScribeSource# 端口是scribe需要发送的端口agent.sources.r1.type= org.apache.flume.source.scribe.ScribeSourceagent.sources.r1.port= 5140agent.sources.r1.channels= c1 #配置agent的组策略,k1、k2采用均衡负载的方式发送到两个collectoragent.sinkgroups=g1agent.sinkgroups.g1.sinks=k1k2agent.sinkgroups.g1.processor.type=load_balanceagent.sinkgroups.g1.processor.backoff=trueagent.sinkgroups.g1.processor.selector=round_robin #配置sink1,采用avro方式,发送到其中一台collectoragent.sinks.k1.type=avroagent.sinks.k1.hostname=10.0.3.82agent.sinks.k1.port=5150 #配置sink2,采用avro方式,发送到其中一台collectoragent.sinks.k2.type=avroagent.sinks.k2.hostname=10.0.3.83agent.sinks.k2.port=5150 # channel采用默认的内存channel,最大缓存20000条,一次最多发送1000条agent.channels.c1.type= memoryagent.channels.c1.capacity= 20000agent.channels.c1.transactionCapacity= 10000 # 绑定channel与source、sink的关系agent.sources.r1.channels= c1agent.sinks.k1.channel= c1agent.sinks.k2.channel=c1

collector配置:

#声明各个节点collector.sources= r1collector.channels= c1 c2collector.sinks= k1 k2 # 定义collector的source,监听本机5150端口,source将数据选择性地发送给c1、c2两个channel#event的header中category值为flume_hdfs,则推送给c1,值为flume_kafka则推送给c2# 若event无标识,则默认推送给c1collector.sources.r1.type= avrocollector.sources.r1.port= 5150collector.sources.r1.bind= 0.0.0.0collector.sources.r1.channels= c1 c2collector.sources.r1.selector.type= multiplexingcollector.sources.r1.selector.header= categorycollector.sources.r1.selector.mapping.flume_hdfs= c1collector.sources.r1.selector.mapping.flume_kafka= c2collector.sources.r1.selector.default= c1 # 定义channel c1、c2#c1为文件channel,若数据发送失败,会将数据持久化到配置的目录下#c2为内存channel,数据发送失败超时后将会丢弃collector.channels.c1.type= filecollector.channels.c1.checkpointDir= /usr/local/apache-flume-1.6.0-bin/fileChannel/checkpointcollector.channels.c1.dataDir= /usr/local/apache-flume-1.6.0-bin/fileChannel/data collector.channels.c2.type= memorycollector.channels.c2.capacity= 1000collector.channels.c2.transactionCapacity= 100 # 定义sink1,该sink获取c1上的数据并发送到hdfs,路径中将会自动解析category的具体值collector.sinks.k1.type= hdfscollector.sinks.k1.channel= c1collector.sinks.k1.hdfs.path= /quantone/flume/%{category}/10.0.3.82collector.sinks.k1.hdfs.fileType= DataStreamcollector.sinks.k1.hdfs.writeFormat= TEXTcollector.sinks.k1.hdfs.rollInterval= 300collector.sinks.k1.hdfs.filePrefix= %Y-%m-%dcollector.sinks.k1.hdfs.round= truecollector.sinks.k1.hdfs.roundValue= 5collector.sinks.k1.hdfs.roundUnit= minutecollector.sinks.k1.hdfs.useLocalTimeStamp= true#collector.sinks.k1.serializer.appendNewline= false # 定义sink2,该sink获取c2上的数据并发送到kafkacollector.sinks.k2.type= org.apache.flume.sink.kafka.KafkaSinkcollector.sinks.k2.channel= c2collector.sinks.k2.brokerList= 10.0.3.178:9092,10.0.3.179:9092collector.sinks.k2.requiredAcks= 1collector.sinks.k2.batchSize= 20

四、启动flume

1、 所有agent机器启动scribe程序,并启动kafka、hadoop

 

2、启动两台flume collector:

bin/flume-ngagent --conf ./conf/ -f conf/collector.conf -Dflume.root.logger=INFO,console -n collector

 

3、启动所有flume agent:

bin/flume-ngagent --conf ./conf/ -f conf/agent.conf -Dflume.root.logger=INFO,console -nagent


0 0
原创粉丝点击