Hadoop集群之flume安装配置
来源:互联网 发布:淘宝评价好评怎么修改 编辑:程序博客网 时间:2024/05/18 02:14
Hadoop集群之flume安装配置
1. 官方文档
http://flume.apache.org/
2. 配置环境变量
vi /etc/profile
#set flum
export FLUME_HOME=/opt/hadoop/flume-bin
export FLUME_CONF_DIR=$FLUME_HOME/conf
export PATH=$PATH:$FLUME_HOME/bin
sftp> put apache-flume-1.6.0-bin.tar.gz
sftp> put apache-flume-1.6.0-src.tar.gz
3. 解压软件包
[hadoop@slavenode1 hadoop]# pwd
/opt/hadoop
tar -zxvf apache-flume-1.6.0-src.tar.gz ; tar -zxvf apache-flume-1.6.0-bin.tar.gz
[hadoop@slavenode1 hadoop]# mv apache-flume-1.6.0-bin flume-bin
4. 修改配置文件
[hadoop@slavenode1 flume-bin]# cd conf/
[hadoop@slavenode1 conf]# cp flume-env.sh.template flume-env.sh
[hadoop@slavenode1 conf]# vi flume-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_79
5. 验证是否安装成功
[hadoop@slavenode1 conf]#
[hadoop@slavenode8 conf]$ /opt/hadoop/flume-bin/bin/flume-ng version
Flume 1.6.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080
Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015
From source with checksum b29e416802ce9ece3269d34233baf43f
6. 分发各节点(slavenode1-slavenode7)
[hadoop@slavenode8 hadoop]$ for i in {32,33,34,35,36,37,38};do scp -r flume-bin 192.168.237.2$i:/opt/hadoop/ ; done
7. Flume采集某个日志的具体实例
创建一个新的目录共平时日志采集放配置文件使用。
[hadoop@slavenode4 example]$ mkdir /opt/hadoop/flume-bin/example
1) 单节点flume直接写入hdfs,监控一个日志文件
[hadoop@slavenode4 example]$ cat flume_directHDFS.conf
# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 100000
agent1.channels.ch1.transactionCapacity = 100000
agent1.channels.ch1.keep-alive = 30
# Define an Avro source called avro-source1 on agent1 and tell it
#define source monitor a file
agent1.sources.avro-source1.type = exec
agent1.sources.avro-source1.shell = /bin/bash -c
agent1.sources.avro-source1.command =tail -n +0 -F /opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-datanode-slavenode4.log
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.threads = 5
# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = hdfs
agent1.sinks.log-sink1.hdfs.path = hdfs://cluster-ha/flumeTest #需要手动创建
agent1.sinks.log-sink1.hdfs.writeFormat = Text
agent1.sinks.log-sink1.hdfs.fileType = DataStream
agent1.sinks.log-sink1.hdfs.rollInterval = 0
agent1.sinks.log-sink1.hdfs.rollSize = 1000000
agent1.sinks.log-sink1.hdfs.rollCount = 0
agent1.sinks.log-sink1.hdfs.batchSize = 1000
agent1.sinks.log-sink1.hdfs.txnEventMax = 1000
agent1.sinks.log-sink1.hdfs.callTimeout = 60000
agent1.sinks.log-sink1.hdfs.appendTimeout = 60000
# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1
[hadoop@masternode2 ~]$ hdfs dfs -mkdir hdfs://cluster-ha/flumeTest
启动如下命令,之后在hdfs上面看效果
[hadoop@slavenode4 example]$ ../bin/flume-ng agent --conf ../conf/ -f flume_directHDFS.conf -n agent1 -Dflume.root.logger=INFO,console 生产环境不用用该选项
当出现如下日志说明配置已成功
2016-10-10 11:16:24,794 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SOURCE, name: avro-source1 started
2016-10-10 11:16:24,811 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:58)] Serializer = TEXT, UseRawLocalFileSystem = false
2016-10-10 11:16:25,166 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creating hdfs://cluster-ha/flumeTest/FlumeData.1476069384811.tmp
2016-10-10 11:16:25,400 (hdfs-log-sink1-call-runner-0) [WARN - org.apache.hadoop.util.NativeCodeLoader.<clinit>(NativeCodeLoader.java:62)] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-10-10 11:16:27,593 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:363)] Closing hdfs://cluster-ha/flumeTest/FlumeData.1476069384811.tmp
2016-10-10 11:16:27,635 (hdfs-log-sink1-call-runner-4) [INFO - org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:629)] Renaming hdfs://cluster-ha/flumeTest/FlumeData.1476069384811.tmp to hdfs://cluster-ha/flumeTest/FlumeData.1476069384811
2016-10-10 11:16:27,681 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creating hdfs://cluster-ha/flumeTest/FlumeData.1476069384812.tmp
HDFS上查看效果如下。自动出来一些日志文件
[hadoop@masternode2 ~]$ hdfs dfs -ls hdfs://cluster-ha/flumeTest/
Found 2 items
-rw-r--r-- 3 hadoop supergroup 1004251 2016-10-10 11:16 hdfs://cluster-ha/flumeTest/FlumeData.1476069384811
-rw-r--r-- 3 hadoop supergroup 179996 2016-10-10 11:16 hdfs://cluster-ha/flumeTest/FlumeData.1476069384812.tmp
2) 在各个webserv日志机上配置 Flume Clie
Slavenode3机器上配置
[hadoop@slavenode3 logs]$ mkdir -p /opt/hadoop/flume-bin/spool/checkpoint;mkdir -p /opt/hadoop/flume-bin/spool/data
[hadoop@slavenode3 logs]$ mkdir /opt/hadoop/flume-bin/logs/
[hadoop@slavenode3 logs]$ cd /opt/hadoop/flume-bin/example/
[hadoop@slavenode3 example]$ ls
flume_Consolidation.conf
[hadoop@slavenode3 example]$ cat flume_Consolidation.conf
clientMainAgent.channels = c1
clientMainAgent.sources = s1
clientMainAgent.sinks = k1 k2
# clientMainAgent sinks group
clientMainAgent.sinkgroups = g1
# clientMainAgent Spooling Directory Source
clientMainAgent.sources.s1.type = spooldir
clientMainAgent.sources.s1.spoolDir =/opt/hadoop/flume-bin/logs/
clientMainAgent.sources.s1.fileHeader = true
clientMainAgent.sources.s1.deletePolicy =immediate
clientMainAgent.sources.s1.batchSize =1000
clientMainAgent.sources.s1.channels =c1
clientMainAgent.sources.s1.deserializer.maxLineLength =1048576
# clientMainAgent FileChannel
clientMainAgent.channels.c1.type = file
clientMainAgent.channels.c1.checkpointDir = /opt/hadoop/flume-bin/spool/checkpoint
clientMainAgent.channels.c1.dataDirs = /opt/hadoop/flume-bin/spool/data
clientMainAgent.channels.c1.capacity = 200000000
clientMainAgent.channels.c1.keep-alive = 30
clientMainAgent.channels.c1.write-timeout = 30
clientMainAgent.channels.c1.checkpoint-timeout=600
# clientMainAgent Sinks
# k1 sink
clientMainAgent.sinks.k1.channel = c1
clientMainAgent.sinks.k1.type = avro
# connect to CollectorMainAgent
clientMainAgent.sinks.k1.hostname = slavenode4
clientMainAgent.sinks.k1.port = 41415
# k2 sink
clientMainAgent.sinks.k2.channel = c1
clientMainAgent.sinks.k2.type = avro
# connect to CollectorBackupAgent
clientMainAgent.sinks.k2.hostname = slavenode3
clientMainAgent.sinks.k2.port = 41415
# clientMainAgent sinks group
clientMainAgent.sinkgroups.g1.sinks = k1 k2
# load_balance type
clientMainAgent.sinkgroups.g1.processor.type = load_balance
clientMainAgent.sinkgroups.g1.processor.backoff = true
clientMainAgent.sinkgroups.g1.processor.selector = random
[hadoop@slavenode3 example]$scp flume_Consolidation.conf hadoop@slavenode4:/opt/hadoop/flume-bin/example/
Slavenode4机器上配置
[hadoop@slavenode4 logs]$ mkdir -p /opt/hadoop/flume-bin/spool/checkpoint;mkdir -p /opt/hadoop/flume-bin/spool/data
[hadoop@slavenode4 logs]$ mkdir /opt/hadoop/flume-bin/logs/
[hadoop@slavenode4 logs]$ cd /opt/hadoop/flume-bin/example/
Server服务器配置
[hadoop@slavenode8 example]$ mkdir -p /opt/hadoop/flume-bin/spool_back/data
[hadoop@slavenode8 example]$ mkdir -p /opt/hadoop/flume-bin/spool/checkpoint;mkdir -p /opt/hadoop/flume-bin/spool/data
[hadoop@slavenode8 example]$ cat flume_Consolidation.conf
collectorMainAgent.channels = c2
collectorMainAgent.sources = s2
collectorMainAgent.sinks =k1 k2
# collectorMainAgent AvroSource
collectorMainAgent.sources.s2.type = avro
collectorMainAgent.sources.s2.bind = slavenode8
collectorMainAgent.sources.s2.port = 41415
collectorMainAgent.sources.s2.channels = c2
# collectorMainAgent FileChannel
collectorMainAgent.channels.c2.type = file
collectorMainAgent.channels.c2.checkpointDir =/opt/hadoop/flume-bin/spool/checkpoint
collectorMainAgent.channels.c2.dataDirs = /opt/hadoop/flume-bin/spool/data,/opt/hadoop/flume-bin/spool_back/data
collectorMainAgent.channels.c2.capacity = 200000000
collectorMainAgent.channels.c2.transactionCapacity=6000
collectorMainAgent.channels.c2.checkpointInterval=60000
# collectorMainAgent hdfsSink
collectorMainAgent.sinks.k2.type = hdfs
collectorMainAgent.sinks.k2.channel = c2
collectorMainAgent.sinks.k2.hdfs.path = hdfs://cluster-ha/flume%{dir}
collectorMainAgent.sinks.k2.hdfs.filePrefix =k2_%{file}
collectorMainAgent.sinks.k2.hdfs.inUsePrefix =_
collectorMainAgent.sinks.k2.hdfs.inUseSuffix =.tmp
collectorMainAgent.sinks.k2.hdfs.rollSize = 0
collectorMainAgent.sinks.k2.hdfs.rollCount = 0
collectorMainAgent.sinks.k2.hdfs.rollInterval = 240
collectorMainAgent.sinks.k2.hdfs.writeFormat = Text
collectorMainAgent.sinks.k2.hdfs.fileType = DataStream
collectorMainAgent.sinks.k2.hdfs.batchSize = 6000
collectorMainAgent.sinks.k2.hdfs.callTimeout = 60000
collectorMainAgent.sinks.k1.type = hdfs
collectorMainAgent.sinks.k1.channel = c2
collectorMainAgent.sinks.k1.hdfs.path = hdfs://cluster-ha/flume%{dir}
collectorMainAgent.sinks.k1.hdfs.filePrefix =k1_%{file}
collectorMainAgent.sinks.k1.hdfs.inUsePrefix =_
collectorMainAgent.sinks.k1.hdfs.inUseSuffix =.tmp
collectorMainAgent.sinks.k1.hdfs.rollSize = 0
collectorMainAgent.sinks.k1.hdfs.rollCount = 0
collectorMainAgent.sinks.k1.hdfs.rollInterval = 240
collectorMainAgent.sinks.k1.hdfs.writeFormat = Text
collectorMainAgent.sinks.k1.hdfs.fileType = DataStream
collectorMainAgent.sinks.k1.hdfs.batchSize = 6000
collectorMainAgent.sinks.k1.hdfs.callTimeout = 60000
[hadoop@slavenode8 example]$ ../bin/flume-ng agent --conf ../conf/ -f flume_Consolidation.conf -n collectorMainAgent -Dflume.root.logger=DEBUG,console
该目录下会自动出现两个目录
[hadoop@slavenode8 spool]$ ls
checkpoint data
- Hadoop集群之flume安装配置
- Hadoop之--flume安装配置
- hadoop集群配置flume
- 完全分布式hadoop集群安装之三:hadoop集群配置
- Hadoop集群之Hive安装配置
- Hadoop集群之Hive安装配置
- Hadoop集群之zookeeper安装配置
- Hadoop集群之Hbase安装配置
- Hadoop集群之Hive HA 安装配置
- Hadoop集群安装配置
- Hadoop集群安装配置
- Hadoop集群安装配置
- Hadoop集群安装配置
- HADOOP 集群安装配置
- Hadoop集群安装配置
- hadoop集群安装配置
- Hadoop集群安装配置
- hadoop集群安装配置
- java 代理模式
- 第七周项目4-队列数组
- TC注册码系统的自动发卡功能
- 浏览器工作机制
- Javascript 多浏览器兼容性问题及解决方案
- Hadoop集群之flume安装配置
- 小萌谈Art(5)——离线编程第一战
- ReadingNotesUp
- 【bzoj 1045】[HAOI2008] 糖果传递
- android 在布局中使用自定义属性
- 设计模式学习1-工厂方法模式
- undefined和null的区别
- ABAP WORKFLOW工作流创建(一)
- AndroidStudio gradle配置