Flume Note
来源:互联网 发布:山下智久北川景子 知乎 编辑:程序博客网 时间:2024/06/11 17:02
flume
收集日志。高效收集、聚合、移动大量日志。架构简单灵活,数据流(动态计算)技术。在线分析应用。
agent
1.source 来源,input2.channel 通道,缓冲区 buffer3.sink 沉出 output
优点
1.支持广泛的中央化存储(hdfs|hbase)2.速率缓冲(生产大于消费)3.上下文路由4.通道中,flume维护了两个事务,一个sender,一个consumer.确保消息可靠性。
flume特征
支持多种source和destination.
flume vs hadoop put
put命令同一时刻只能传输一个文件。put处理静态数据,对于实时生成动态数据没办法。
Flume event
事件是flume传输数据的基本单位。由header + byte[]构成。
安装flume
1.tar -xzvf apache-flume-1.6.0-bin.tar.gz -C /soft2.配置环境变量3.创建配置文件 [/soft/flume/conf/r_nc.conf] a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type=netcat a1.sources.r1.bind=localhost a1.sources.r1.port=8888 a1.channels.c1.type=memory a1.sinks.k1.type=logger a1.sources.r1.channels=c1 a1.sinks.k1.channel=c14.启动flume flume-ng agent -f /soft/flume/conf/r_nc.conf -n a1 -Dflume.root.logger=INFO,console
spooling
监控目录的。
[SpoolDir方式监控指定目录]
不能做到实时收集,只能写入完成文件后,通过mv方式移动到监控目录下。
对已经标注为.completed文件如果追加的新内容,不能收集到。
r_spooldir.conf
—————-
#component
a1.sources=r1
a1.sinks=k1
a1.channels=c1
a1.sources.r1.type=spooldira1.sources.r1.spoolDir=/home/centos/logsa1.sinks.k1.type=loggera1.channels.c1.type=memory#bindinga1.sources.r1.channels=c1a1.sinks.k1.channel=c1启动$>flume-ng agent -f r_spooldir -n a1 -Dflume.root.logger=INFO,console
实时收集
[r_exec.conf]#componenta1.sources=r1a1.sinks=k1a1.channels=c1a1.sources.r1.type=execa1.sources.r1.command=tail -F /home/centos/logs/1.txta1.sinks.k1.type=loggera1.channels.c1.type=memory#bindinga1.sources.r1.channels=c1a1.sinks.k1.channel=c1启动$>flume-ng agent -f /soft/flume/conf/r_exec.conf -n a1 -Dflume.root.logger=INFO,console
avro source
avro source启动的avro的socket server,接受avroclient发送来的avro事件。客户端需要通过flume-ng avro-client命令进行发送.[r_avro.conf]#componenta1.sources=r1a1.sinks=k1a1.channels=c1a1.sources.r1.type=avroa1.sources.r1.bind=localhosta1.sources.r1.port=8888a1.sinks.k1.type=loggera1.channels.c1.type=memory#bindinga1.sources.r1.channels=c1a1.sinks.k1.channel=c11.启动avro agent$>flume-ng agent -f /soft/flume/conf/r_avro.conf -n a1 -Dflume.root.logger=INFO,console2.通过avro client发送avro事件给avro source. -H:avro source主机 -p:端口 -F:发送的文件,每个一个事件。 $>flume-ng avro-client -H localhost -p 8888 -F /home/centos/logs/1.txt
seq源:序列源
序列生成器,从0开始,步长1,用于测试。[r_seq.conf]#componenta1.sources=r1a1.sinks=k1a1.channels=c1a1.sources.r1.type=seqa1.sinks.k1.type=loggera1.channels.c1.type=memory#bindinga1.sources.r1.channels=c1a1.sinks.k1.channel=c1启动avro agent$>flume-ng agent -f /soft/flume/conf/r_seq.conf -n a1 -Dflume.root.logger=INFO,console
stress源:压力源
可用压测。可以指定发送事件的总数,还可以指定每个事件的数据量大小(byte[]).[r_stress.conf]#componenta1.sources=r1a1.sinks=k1a1.channels=c1a1.sources.r1.type=org.apache.flume.source.StressSourcea1.sources.r1.size=10240a1.sources.r1.maxTotalEvents=100a1.sinks.k1.type=loggera1.channels.c1.type=memory#bindinga1.sources.r1.channels=c1a1.sinks.k1.channel=c1启动avro agent$>flume-ng agent -f /soft/flume/conf/r_stress.conf -n a1 -Dflume.root.logger=INFO,console
sink
1.file_roll 存放数据在本地目录。 a1.sources=r1 a1.sinks=k1 a1.channels=c1 a1.sources.r1.type=netcat a1.sources.r1.bind=localhost a1.sources.r1.port=8888 a1.sinks.k1.type=file_roll a1.sinks.k1.sink.directory=/home/centos/flume a1.channels.c1.type=memory a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1
channle
1.memory a1.sources=r1 a1.sinks=k1 a1.channels=c1 a1.sources.r1.type=seq a1.channels.c1.type=memory a1.channels.c1.capacity=100000 a1.channels.c1.transactionCapacity=100 a1.sinks.k1.type=logger a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1
hdfs sink
数据写入到hdfs文件系统。[k_hdfs.conf]a1.sources = r1a1.channels = c1a1.sinks = k1a1.sources.r1.type=seqa1.sources.r1.totalEvents=1000a1.channels.c1.type=memorya1.channels.c1.capacity=100000a1.channels.c1.transactionCapacity=100a1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.path = /user/centos/logs/%y-%m-%d/%H%M/%S#文件名前缀,生成的文件以时间戳命名.a1.sinks.k1.hdfs.filePrefix = events-#是否按照整时间段划分,否则每秒生成对应目录a1.sinks.k1.hdfs.round = truea1.sinks.k1.hdfs.roundValue = 10a1.sinks.k1.hdfs.roundUnit = second#按照时间间隔(秒数)a1.sinks.k1.hdfs.rollInterval= 300#按照字节数()a1.sinks.k1.hdfs.rollSize = 1024000#按照事件个数()a1.sinks.k1.hdfs.rollCount = 20#修改文件类型,默认sequencefile,可以DataStream,CompressedStreama1.sinks.k1.hdfs.fileType = DataStream#使用本地时间作为时间戳,不再从header中提取时间戳信息。a1.sinks.k1.hdfs.useLocalTimeStamp = truea1.sources.r1.channels = c1a1.sinks.k1.channel = c1启动$>flume-ng agent -f /soft/flume/conf/r_stress.conf -n a1 -Dflume.root.logger=INFO,console
sink数据到hbase
1.同步hbase导入(????,类库版本问题) a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type=seq a1.sources.r1.totalEvents=1000 a1.channels.c1.type=memory a1.channels.c1.capacity=100000 a1.channels.c1.transactionCapacity=100 a1.sinks.k1.type = hbase a1.sinks.k1.table = ns1:t9 a1.sinks.k1.columnFamily = f1 a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer a1.sources.r1.channels = c1 a1.sinks.k1.channel = c12.async导入hbase,ok. a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type=seq a1.sources.r1.totalEvents=1000 a1.channels.c1.type=memory a1.channels.c1.capacity=100000 a1.channels.c1.transactionCapacity=100 a1.sinks.k1.type = asynchbase a1.sinks.k1.table = ns1:t9 a1.sinks.k1.columnFamily = f1 a1.sinks.k1.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
avro
基于schema,json格式,自我描述的串行化系统,跨语言.1.avro source a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type=avro a1.sources.r1.bind=localhost a1.sources.r1.port=8888 a1.channels.c1.type=memory a1.channels.c1.capacity=100000 a1.channels.c1.transactionCapacity=100 a1.sinks.k1.type = logger a1.sources.r1.channels = c1 a1.sinks.k1.channel = c12.启动agent flume-ng agent -f r_avro.conf -n a1 -Dflume.root.logger=INFo,console3.通过avro-client命令发送数据 flume-ng avro-client -H localhost -p 8888 -F ~/1.txt
使用avro source和avro sink组合实现
1.创建文件 a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type=seq a1.sources.r1.totalEvents=100 a1.channels.c1.type=memory a1.channels.c1.capacity=100000 a1.channels.c1.transactionCapacity=100 a1.sinks.k1.type = avro a1.sinks.k1.hostname = localhost a1.sinks.k1.port = 8888 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 a2.sources = r2 a2.channels = c2 a2.sinks = k2 a2.sources.r2.type=avro a2.sources.r2.bind=localhost a2.sources.r2.port=8888 a2.channels.c2.type=memory a2.channels.c2.capacity=100000 a2.channels.c2.transactionCapacity=100 a2.sinks.k2.type = logger a2.sources.r2.channels = c2 a2.sinks.k2.channel = c22.启动agent 2.1)启动后者 flume-ng agent -f avro-hop.conf -n a2 -Dflume.root.logger=INFO,console 2.2)启动前者 flume-ng agent -f avro-hop.conf -n a1
从source出来的event进入channel中,如果channel关联了多个,
可以使用channel selector进行控制
1.replicating方式选择策略 a1.sources = r1 a1.channels = c1 c2 a1.sinks = k1 k2 a1.sources.r1.type=seq a1.sources.r1.totalEvents=100 a1.sources.r1.selector.type = replicating #如果失败直接忽略,如果没有optional,需要事务故障处理 a1.sources.r1.selector.optional = c2 a1.channels.c1.type=memory a1.channels.c1.capacity=100000 a1.channels.c1.transactionCapacity=100 a1.channels.c2.type=memory a1.channels.c2.capacity=100000 a1.channels.c2.transactionCapacity=100 a1.sinks.k1.type=file_roll a1.sinks.k1.sink.directory=/user/centos/flume/f1 a1.sinks.k2.type=file_roll a1.sinks.k2.sink.directory=/user/centos/flume/f2 a1.sources.r1.channels = c1 c2 a1.sinks.k1.channel = c1 a1.sinks.k2.channel = c22.multiplexing通道选择策略 a1.sources = r1 a1.channels = c1 c2 a1.sinks = k1 k2 a1.sources.r1.type=avro a1.sources.r1.bind=localhost a1.sources.r1.port=8888 a1.sources.r1.selector.type = multiplexing a1.sources.r1.selector.header = city a1.sources.r1.selector.mapping.bd = c1 a1.sources.r1.selector.mapping.sjz = c2 a1.sources.r1.selector.default = c1 a1.channels.c1.type=memory a1.channels.c2.type=memory a1.sinks.k1.type=file_roll a1.sinks.k1.sink.directory=/user/centos/flume/f1 a1.sinks.k2.type=file_roll a1.sinks.k2.sink.directory=/user/centos/flume/f2 a1.sources.r1.channels = c1 c2 a1.sinks.k1.channel = c1 a1.sinks.k2.channel = c2
flume
日志收集工具。
agent
1.source input exec tail -F netcat avro spooldir seq //序列源 stress //压力 channelSelector2.channel buffer memory // file //文件3.sink out hdfs hbase logger avro
sinkprocessor
0.sink处理器 针对sink group而言的,1.failover 容灾sink处理器,维护了一组sink列表,只有事件可用,就能处理。 选择有限级高的sink处理所有事件,直到sink故障,再选择优先级低的sink继续处理。 没有负载平衡处理。只是做到容灾。 a1.sources = r1 a1.channels = c1 a1.sinkgroups = g1 a1.sinks = k1 k2 a1.sources.r1.type=netcat a1.sources.r1.bind=localhost a1.sources.r1.port=8888 a1.channels.c1.type = memory a1.channels.c1.capacity = 100000 a1.channels.c1.transactionCapacity = 100 a1.sinks.k1.type=file_roll a1.sinks.k1.sink.directory=/home/centos/flume/f1 a1.sinks.k2.type=file_roll a1.sinks.k2.sink.directory=/home/centos/flume/f2 a1.sinkgroups.g1.sinks = k1 k2 a1.sinkgroups.g1.processor.type = failover a1.sinkgroups.g1.processor.priority.k1 = 1 a1.sinkgroups.g1.processor.priority.k2 = 2 a1.sinkgroups.g1.processor.maxpenalty = 10000 a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1 a1.sinks.k2.channel=c12.Load balancing 负载平衡,使用轮询或随机方式寻找下一个sink进行处理,如果失败再 继续进行选择,可以实现容灾可负载平衡机制。 a1.sources = r1 a1.channels = c1 a1.sinkgroups = g1 a1.sinks = k1 k2 a1.sources.r1.type=netcat a1.sources.r1.bind=localhost a1.sources.r1.port=8888 a1.channels.c1.type = memory a1.channels.c1.capacity = 100000 a1.channels.c1.transactionCapacity = 100 a1.sinks.k1.type=file_roll a1.sinks.k1.sink.directory=/home/centos/flume/f1 a1.sinks.k2.type=file_roll a1.sinks.k2.sink.directory=/home/centos/flume/f2 a1.sinkgroups.g1.sinks = k1 k2 a1.sinkgroups.g1.processor.type = load_balance a1.sinkgroups.g1.processor.backoff = true a1.sinkgroups.g1.processor.selector = round_robin a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1 a1.sinks.k2.channel=c1
channel
1.MemoryChannel 速率高,故障后数据有丢失。 a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type=seq a1.sources.r1.totalEvents=10000 a1.channels.c1.type = memory a1.sinks.k1.type=logger a1.sources.r1.channels=c1 a1.sinks.k1.channel=c12.文件通道 数据写入到磁盘,可靠性,速率低。容灾. a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type=seq a1.sources.r1.totalEvents=10000 a1.channels.c1.type = file a1.channels.c1.checkpointDir = /home/centos/flume/chk a1.channels.c1.dataDirs = /home/centos/flume/data a1.sinks.k1.type=logger a1.sources.r1.channels=c1 a1.sinks.k1.channel=c13.Spillable Memory Channe 事件在内存和磁盘均存放,实验性的,不推荐生产使用。 memoryCapacity //内存容量 overflowCapacity //文件容量 a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type=org.apache.flume.source.StressSource a1.sources.r1.size=102400 a1.sources.r1.maxTotalEvents=20000 // a1.channels.c1.type = SPILLABLEMEMORY //该值=0,意味着禁用内存通道。 a1.channels.c1.memoryCapacity = 10000 //相当于禁用fileChannel a1.channels.c1.overflowCapacity = 0 a1.channels.c1.byteCapacity = 800000 a1.channels.c1.checkpointDir = /home/centos/flume/chk a1.channels.c1.dataDirs = /home/centos/flume/data a1.sinks.k1.type=logger a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1
拦截器
a1.sources = r1a1.channels = c1a1.sources.r1.channels = c1a1.sources.r1.type = seqa1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = statica1.sources.r1.interceptors.i1.key = datacentera1.sources.r1.interceptors.i1.value = NEW_YORK
限速
限制source数据生成速度。自定义拦截器实现。 0.pom.xml <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.it18zhang</groupId> <artifactId>my-flume</artifactId> <version>1.0-SNAPSHOT</version> <dependencies> <dependency> <groupId>org.apache.flume</groupId> <artifactId>flume-ng-core</artifactId> <version>1.6.0</version> </dependency> </dependencies> </project>1.编写拦截器类
package com.it18zhang.flume.interceptor; import org.apache.flume.Context; import org.apache.flume.Event; import org.apache.flume.interceptor.Interceptor; import java.io.FileOutputStream; import java.util.List; /** * 限速拦截器 */ public class LimitSpeedInterceptor2 implements Interceptor{ //最后一次发送事件时间 private long lastSendNano = 0 ; //上次发送事件的长度(body,不计算header信息) private long lastEventLength = 0; //发送事件的上限速率(10k/s) private static long maxRate = 1024 ; public void initialize() { } public Event intercept(Event event) { System.out.println("===================================="); //不是首次 if (lastSendNano != 0){ //毫秒 long nowNano = System.nanoTime() ; //发送上次事件花的时间值 long ellapseNano = nowNano - lastSendNano; //需要花费的时间毫秒数 long needNano= (long)((float)lastEventLength / maxRate * 1000 * 1000); System.out.printf("now:%d , ella:%d , needMi:%d\r\n" , nowNano , ellapseNano,needNano) ; // if(needNano > ellapseNano){ try { System.out.printf("sleep : %d\r\n" , (needNano - ellapseNano) / 1000); int nano0 = (int)((needNano - ellapseNano) % 1000000 ); Thread.sleep((needNano - ellapseNano) / 1000 , nano0); } catch (InterruptedException e) { e.printStackTrace(); } } } this.lastEventLength = event.getBody().length; this.lastSendNano = System.nanoTime(); System.out.printf("lastEventLength:%d , lastSendNano:%d\r\n", lastEventLength, lastSendNano); return event; } public List<Event> intercept(List<Event> events) { for(Event e : events){ intercept(e) ; } return events; } public void close() { } public static class Builder implements Interceptor.Builder { private static final String MAX_RATE= "maxRatePerSecond" ; public Interceptor build() { return new LimitSpeedInterceptor2() ; } public void configure(Context context) { //LimitSpeedInterceptor2.maxRate = context.getLong(MAX_RATE , 10 * 1024L) ; System.out.println(" conf== " + LimitSpeedInterceptor2.maxRate); } } private void outFile(String log){ try { FileOutputStream fos = new FileOutputStream("/home/centos/i.log",true); fos.write((log + "\r\n").getBytes()); fos.flush(); fos.close(); } catch (Exception e) { e.printStackTrace(); } } }
2.导出jar包3.部署jar到flume的classpath下。 flume/lib/xxx4.配置自定义拦截器 a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type=org.apache.flume.source.StressSource a1.sources.r1.size=10240 a1.sources.r1.maxTotalEvents=10000 a1.sources.r1.interceptors = i1 #配置的是builder a1.sources.r1.interceptors.i1.type = com.it18zhang.flume.interceptor.LimitSpeedInterceptor$Builder a1.sources.avroSrc.interceptors.i1.limitRate = 20 a1.sources.avroSrc.interceptors.i1.headerSize = 10 a1.channels.c1.type = memory a1.channels.c1.capacity = 100000 a1.channels.c1.transactionCapacity = 100 a1.sinks.k1.type=logger a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1
more details
阅读全文
0 0
- Flume Note
- Flume
- Flume
- flume
- flume
- flume
- Flume
- Flume
- Flume
- flume
- flume
- Flume
- Flume
- flume
- flume
- Flume
- Flume
- flume
- 【Z】我的学习方法论
- AC自动机模板程序
- 事务隔离级别和YII事务隔离操作
- label的扩大点击范围影响到了监听事件
- Nginx设置404错误页面的几种方法
- Flume Note
- 实验吧 Web 这个看起来有点简单! Writeup
- Git:误操作commit之后怎么办?
- 互联网+智能停车4.0管理系统,只为停车更方便
- android效果图之轮播
- Java/Android设计模式<二>观察者模式
- 认知,构建个人的知识体系(上)
- Leetcode Combination
- Hive安装步骤