大数据企业学习篇05----flume初识

来源:互联网 发布:淘宝放错类目降权 编辑:程序博客网 时间:2024/05/29 04:13

一、flume架构

<1>Flume is a distributed, reliable, and availableservice for efficiently collecting, aggregating, and moving large amounts of log data.
<2>It has a simple and flexible architecture based on streaming data flows. It is robust(健壮)and fault tolerant (容错)with tunable reliability mechanisms and many failover and recovery mechanisms.
<3>It uses a simple extensible data model that allows for online analytic application.(实时性要求较高)
<4>flume data flow model
这里写图片描述
<5>flume中的角色
这里写图片描述
<6>flume中的数据传输
这里写图片描述
<7>flume的三要素
这里写图片描述

二、flume的初步使用

<1>解压缩,配置flume-env.sh

export JVAV_HOME=/opt/software/jdk1.7.0_67

<2>flume常用的命令

bin/flume-ng Usage: bin/flume-ng <command> [options]...commands:  agent                     run a Flume agentglobal options:  --conf,-c <conf>          use configs in <conf> directory  -Dproperty=value          sets a Java system property valueagent options:  --name,-n <name>          the name of this agent (required)  --conf-file,-f <file>     specify a config file (required if -z missing)

<3>启动agent
An agent is started using a shell script called flume-ng which is located in the bin directory of the Flume distribution. You need to specify the agent name, the config directory, and the config file on the command line:

bin/flume-ng agent --conf conf --name agent-test --conf-file test.conf

Now the agent will start running source and sinks configured in the given properties file.

<4>安装telnet
*安装rpm包

rpm -ivh ./*.rpm

*启动xinetd服务

/etc/rc.d/init.d/xinetd restart

<5>简单的样例
* 在conf下新建a1.conf
* 编写a1.conf(四步走:agent、source、channel、sink)

# example.conf: A single-node Flume configuration# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = netcata1.sources.r1.bind = localhosta1.sources.r1.port = 44444# Describe the sinka1.sinks.k1.type = logger# Use a channel which buffers events in memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c1

*运行

bin/flume-ng agent \-c conf \-n a1 \-f conf/a1.conf \-Dflume.root.logger=DEBUG,console

*测试是否启动监听端口

telnet -nltp

*启动客户端

telnet localhost 44444

三、flume收集hive运行日志

<1>思路分析
* 收集log
hive运行的日志
/opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log
tail -f
* memory

  • hdfs
    /user/beifeng/flume/hive-logs/
    <2>为了使用HDFS sink,需将如下jar包放置到flume/lib下
    这里写图片描述
    <3>编写agent配置文件
# The configuration file needs to define the sources, # the channels and the sinks.# Sources, channels and sinks are defined per agent, # in this case called 'agent'### define agent#######a2.sources = r2a2.channels = c2a2.sinks = k2### define sources #####a2.sources.r2.type=execa2.sources.r2.command=tail -F /opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log### define channels####a2.channels.c2.type=memory###define sinks ###a2.sinks.k2.type=hdfsa2.sinks.k2.hdfs.path=hdfs://hadoop-senior.ibeifeng.com:8020/user/beifeng/flume/hive.loga2.sinks.k2.hdfs.fileType=DataStreama2.sinks.k2.hdfs.batchSize=10### bind sources and sinks###a2.sources.r2.channels=c2a2.sinks.k2.channel=c2

<4>运行

bin/flume-ng agent \-c conf \-n a2 \-f conf/a2.conf \-Dflume.root.logger=DEBUG,console

四、Flume项目架构

这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述

五、flume实战案例

这里写图片描述
这里写图片描述
<1>agent编写

# The configuration file needs to define the sources, # the channels and the sinks.# Sources, channels and sinks are defined per agent, # in this case called 'agent'### define agent#######a3.sources = r3a3.channels = c3a3.sinks = k3### define sources #####a3.sources.r3.type=spooldira3.sources.r3.spoolDir=/opt/datasa3.sources.r3.ignorePattern=^(.)*\\.txt$### define channels####a3.channels.c3.type=filea3.channels.c3.checkpointDir =/opt/datas/check_dira3.channels.c3.dataDirs =/opt/datas/flume_data###define sinks ###a3.sinks.k3.type= hdfs**a3.sinks.k3.hdfs.path=hdfs://hadoop-senior.ibeifeng.com:8020/user/beifeng/flume/%Y%m%da3.sinks.k3.hdfs.useLocalTimeStamp=true**### bind sources and sinks###a3.sources.r3.channels=c3a3.sinks.k3.channel=c3

<2>测试运行

bin/flume-ng agent \-c conf \-n a3 \-f conf/a3.conf \-Dflume.root.logger=DEBUG,console