大数据企业学习篇05----flume初识
来源:互联网 发布:淘宝放错类目降权 编辑:程序博客网 时间:2024/05/29 04:13
一、flume架构
<1>Flume is a distributed, reliable, and availableservice for efficiently collecting, aggregating, and moving large amounts of log data.
<2>It has a simple and flexible architecture based on streaming data flows. It is robust(健壮)and fault tolerant (容错)with tunable reliability mechanisms and many failover and recovery mechanisms.
<3>It uses a simple extensible data model that allows for online analytic application.(实时性要求较高)
<4>flume data flow model
<5>flume中的角色
<6>flume中的数据传输
<7>flume的三要素
二、flume的初步使用
<1>解压缩,配置flume-env.sh
export JVAV_HOME=/opt/software/jdk1.7.0_67
<2>flume常用的命令
bin/flume-ng Usage: bin/flume-ng <command> [options]...commands: agent run a Flume agentglobal options: --conf,-c <conf> use configs in <conf> directory -Dproperty=value sets a Java system property valueagent options: --name,-n <name> the name of this agent (required) --conf-file,-f <file> specify a config file (required if -z missing)
<3>启动agent
An agent is started using a shell script called flume-ng which is located in the bin directory of the Flume distribution. You need to specify the agent name, the config directory, and the config file on the command line:
bin/flume-ng agent --conf conf --name agent-test --conf-file test.conf
Now the agent will start running source and sinks configured in the given properties file.
<4>安装telnet
*安装rpm包
rpm -ivh ./*.rpm
*启动xinetd服务
/etc/rc.d/init.d/xinetd restart
<5>简单的样例
* 在conf下新建a1.conf
* 编写a1.conf(四步走:agent、source、channel、sink)
# example.conf: A single-node Flume configuration# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = netcata1.sources.r1.bind = localhosta1.sources.r1.port = 44444# Describe the sinka1.sinks.k1.type = logger# Use a channel which buffers events in memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c1
*运行
bin/flume-ng agent \-c conf \-n a1 \-f conf/a1.conf \-Dflume.root.logger=DEBUG,console
*测试是否启动监听端口
telnet -nltp
*启动客户端
telnet localhost 44444
三、flume收集hive运行日志
<1>思路分析
* 收集log
hive运行的日志
/opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log
tail -f
* memory
- hdfs
/user/beifeng/flume/hive-logs/
<2>为了使用HDFS sink,需将如下jar包放置到flume/lib下
<3>编写agent配置文件
# The configuration file needs to define the sources, # the channels and the sinks.# Sources, channels and sinks are defined per agent, # in this case called 'agent'### define agent#######a2.sources = r2a2.channels = c2a2.sinks = k2### define sources #####a2.sources.r2.type=execa2.sources.r2.command=tail -F /opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log### define channels####a2.channels.c2.type=memory###define sinks ###a2.sinks.k2.type=hdfsa2.sinks.k2.hdfs.path=hdfs://hadoop-senior.ibeifeng.com:8020/user/beifeng/flume/hive.loga2.sinks.k2.hdfs.fileType=DataStreama2.sinks.k2.hdfs.batchSize=10### bind sources and sinks###a2.sources.r2.channels=c2a2.sinks.k2.channel=c2
<4>运行
bin/flume-ng agent \-c conf \-n a2 \-f conf/a2.conf \-Dflume.root.logger=DEBUG,console
四、Flume项目架构
五、flume实战案例
<1>agent编写
# The configuration file needs to define the sources, # the channels and the sinks.# Sources, channels and sinks are defined per agent, # in this case called 'agent'### define agent#######a3.sources = r3a3.channels = c3a3.sinks = k3### define sources #####a3.sources.r3.type=spooldira3.sources.r3.spoolDir=/opt/datasa3.sources.r3.ignorePattern=^(.)*\\.txt$### define channels####a3.channels.c3.type=filea3.channels.c3.checkpointDir =/opt/datas/check_dira3.channels.c3.dataDirs =/opt/datas/flume_data###define sinks ###a3.sinks.k3.type= hdfs**a3.sinks.k3.hdfs.path=hdfs://hadoop-senior.ibeifeng.com:8020/user/beifeng/flume/%Y%m%da3.sinks.k3.hdfs.useLocalTimeStamp=true**### bind sources and sinks###a3.sources.r3.channels=c3a3.sinks.k3.channel=c3
<2>测试运行
bin/flume-ng agent \-c conf \-n a3 \-f conf/a3.conf \-Dflume.root.logger=DEBUG,console
- 大数据企业学习篇05----flume初识
- 大数据企业学习篇02_1------hadoop初识
- 大数据企业学习篇03_1------hive 初识
- 大数据(七) - Flume
- 大数据企业学习篇02_3-------hadoop高级
- 大数据企业学习篇02_2------hadoop深入
- 大数据企业学习篇03_2-----hive 深入
- 大数据企业学习篇03_3------hive 高级
- 大数据企业学习篇04-----Sqoop浅析
- 大数据企业学习篇06----Oozie详解
- 大数据学习——Flume介绍与安装
- 大数据学习笔记:Flume导数据至Kafka
- [大数据]flume日志收集
- 大数据相关之flume
- Flume学习笔记之初识(一)
- Flume学习笔记之初识(二)
- Flume学习笔记之初识(三)
- 初识大数据
- Cmake编译OpenCV源码提示error MSB6006: “cmd.exe”已退出,代码为 1解决思路
- Android学习笔记之Material Design实战
- php-基础可用知识总结--高能这篇文章很长
- 在GridControl中自动生成序号
- Bootstrap CSS表格
- 大数据企业学习篇05----flume初识
- iOS 报错 clang: error: no input files ~解决bug
- Linux中修改网络基本设置的方法
- linux python2升级到python3(源码编译安装)
- Springboot的常用配置
- Hibernate常用注解
- Linux中的文件i节点
- addChildViewController后 Childvc viewWillAppear 不调用的问题
- Java枚举