详解scribe+flume搭建高可用的负载均衡日志收集系统入hadoop和kafka

来源：互联网发布：学外语的软件编辑：程序博客网时间：2024/05/21 12:39

一、系统架构

为增强系统的可靠性，flume系统分为agent层和collector层

agent层是每个每个需要收集日志的主机，有若干台，可自由扩展；每台agent机器运行一个carpenter程序将相应目录下的日志信息发送给本机上的flume source，对应avro sink将数据推送到两台collector（采用均衡负载的方式推送，若其中一台collector故障则全部推送给另一台）

collector层的两台机器收到数据后，按照分配策略把数据发送给hdfs和kafka:

二、安装flume

1、所有运行flume的机器需安装java环境，agent层机器安装scribe，collector层机器安装并配置好hadoop、kafka环境（安装方式见相应文档）

2、Flume压缩包直接解压即可使用

tar-zxvf apache-flume-1.6.0-bin.tar.gz  -C/usr/local

3、用flume-ng-kafka-sink-1.6.0.jar替换flume主目录lib文件夹内的flume-ng-kafka-sink-1.6.0.jar

4、配置系统环境变量：

vi/etc/profile

添加：

exportFLUME_HOME=/usr/local/apache-flume-1.6.0-binexportPATH=$PATH:$FLUME_HOME/bin

执行：

sorce/etc/profile

5、配置flume主目录conf下的flume-env.sh

添加：

exportJAVA_HOME=/usr/java/jdk1.7.0_79

6、运行：

flume-ngversion

出现：

Flume1.6.0Sourcecode repository: https://git-wip-us.apache.org/repos/asf/flume.gitRevision:2561a23240a71ba20bf288c7c2cda88f443c2080Compiledby hshreedharan on Mon May 11 11:15:44 PDT 2015Fromsource with checksum b29e416802ce9ece3269d34233baf43f

表示flume安装成功

7、在collector机器上创建file channel的数据持久化目录，目录名以配置文件中file channel 指定的两个目录为标准

collector.channels.c1.checkpointDir=/usr/local/apache-flume-1.6.0-bin/fileChannel/checkpointcollector.channels.c1.dataDir =/usr/local/apache-flume-1.6.0-bin/fileChannel/data

三、Flume配置

配置文件存放在flume主目录conf文件夹下

agent配置：

#声明各个节点agent.sources= r1agent.sinks= k1 k2agent.channels= c1# 配置agent的source，由于需要兼容scribe，此处source类型配置为ScribeSource# 端口是scribe需要发送的端口agent.sources.r1.type= org.apache.flume.source.scribe.ScribeSourceagent.sources.r1.port= 5140agent.sources.r1.channels= c1 #配置agent的组策略，k1、k2采用均衡负载的方式发送到两个collectoragent.sinkgroups=g1agent.sinkgroups.g1.sinks=k1k2agent.sinkgroups.g1.processor.type=load_balanceagent.sinkgroups.g1.processor.backoff=trueagent.sinkgroups.g1.processor.selector=round_robin #配置sink1，采用avro方式，发送到其中一台collectoragent.sinks.k1.type=avroagent.sinks.k1.hostname=10.0.3.82agent.sinks.k1.port=5150 #配置sink2，采用avro方式，发送到其中一台collectoragent.sinks.k2.type=avroagent.sinks.k2.hostname=10.0.3.83agent.sinks.k2.port=5150 # channel采用默认的内存channel，最大缓存20000条，一次最多发送1000条agent.channels.c1.type= memoryagent.channels.c1.capacity= 20000agent.channels.c1.transactionCapacity= 10000 # 绑定channel与source、sink的关系agent.sources.r1.channels= c1agent.sinks.k1.channel= c1agent.sinks.k2.channel=c1

collector配置：

#声明各个节点collector.sources= r1collector.channels= c1 c2collector.sinks= k1 k2 # 定义collector的source，监听本机5150端口，source将数据选择性地发送给c1、c2两个channel#event的header中category值为flume_hdfs，则推送给c1，值为flume_kafka则推送给c2# 若event无标识，则默认推送给c1collector.sources.r1.type= avrocollector.sources.r1.port= 5150collector.sources.r1.bind= 0.0.0.0collector.sources.r1.channels= c1 c2collector.sources.r1.selector.type= multiplexingcollector.sources.r1.selector.header= categorycollector.sources.r1.selector.mapping.flume_hdfs= c1collector.sources.r1.selector.mapping.flume_kafka= c2collector.sources.r1.selector.default= c1 # 定义channel c1、c2#c1为文件channel，若数据发送失败，会将数据持久化到配置的目录下#c2为内存channel，数据发送失败超时后将会丢弃collector.channels.c1.type= filecollector.channels.c1.checkpointDir= /usr/local/apache-flume-1.6.0-bin/fileChannel/checkpointcollector.channels.c1.dataDir= /usr/local/apache-flume-1.6.0-bin/fileChannel/data collector.channels.c2.type= memorycollector.channels.c2.capacity= 1000collector.channels.c2.transactionCapacity= 100 # 定义sink1，该sink获取c1上的数据并发送到hdfs，路径中将会自动解析category的具体值collector.sinks.k1.type= hdfscollector.sinks.k1.channel= c1collector.sinks.k1.hdfs.path= /quantone/flume/%{category}/10.0.3.82collector.sinks.k1.hdfs.fileType= DataStreamcollector.sinks.k1.hdfs.writeFormat= TEXTcollector.sinks.k1.hdfs.rollInterval= 300collector.sinks.k1.hdfs.filePrefix= %Y-%m-%dcollector.sinks.k1.hdfs.round= truecollector.sinks.k1.hdfs.roundValue= 5collector.sinks.k1.hdfs.roundUnit= minutecollector.sinks.k1.hdfs.useLocalTimeStamp= true#collector.sinks.k1.serializer.appendNewline= false # 定义sink2，该sink获取c2上的数据并发送到kafkacollector.sinks.k2.type= org.apache.flume.sink.kafka.KafkaSinkcollector.sinks.k2.channel= c2collector.sinks.k2.brokerList= 10.0.3.178:9092,10.0.3.179:9092collector.sinks.k2.requiredAcks= 1collector.sinks.k2.batchSize= 20

四、启动flume

1、所有agent机器启动scribe程序，并启动kafka、hadoop

2、启动两台flume collector：

bin/flume-ngagent --conf ./conf/ -f conf/collector.conf -Dflume.root.logger=INFO,console -n collector

3、启动所有flume agent：

bin/flume-ngagent --conf ./conf/ -f conf/agent.conf -Dflume.root.logger=INFO,console -nagent

0 0