Flume Note

来源:互联网 发布:山下智久北川景子 知乎 编辑:程序博客网 时间:2024/06/11 17:02

flume

收集日志。高效收集、聚合、移动大量日志。架构简单灵活,数据流(动态计算)技术。在线分析应用。

agent

1.source      来源,input2.channel    通道,缓冲区 buffer3.sink    沉出 output

优点

1.支持广泛的中央化存储(hdfs|hbase)2.速率缓冲(生产大于消费)3.上下文路由4.通道中,flume维护了两个事务,一个sender,一个consumer.确保消息可靠性。

flume特征

支持多种source和destination.

flume vs hadoop put

put命令同一时刻只能传输一个文件。put处理静态数据,对于实时生成动态数据没办法。

Flume event

事件是flume传输数据的基本单位。由header + byte[]构成。

安装flume

1.tar -xzvf apache-flume-1.6.0-bin.tar.gz -C /soft2.配置环境变量3.创建配置文件    [/soft/flume/conf/r_nc.conf]    a1.sources = r1    a1.channels = c1    a1.sinks = k1    a1.sources.r1.type=netcat    a1.sources.r1.bind=localhost    a1.sources.r1.port=8888    a1.channels.c1.type=memory    a1.sinks.k1.type=logger    a1.sources.r1.channels=c1    a1.sinks.k1.channel=c14.启动flume    flume-ng agent -f /soft/flume/conf/r_nc.conf -n a1 -Dflume.root.logger=INFO,console

spooling

监控目录的。

[SpoolDir方式监控指定目录]
不能做到实时收集,只能写入完成文件后,通过mv方式移动到监控目录下。
对已经标注为.completed文件如果追加的新内容,不能收集到。
r_spooldir.conf
—————-
#component
a1.sources=r1
a1.sinks=k1
a1.channels=c1

a1.sources.r1.type=spooldira1.sources.r1.spoolDir=/home/centos/logsa1.sinks.k1.type=loggera1.channels.c1.type=memory#bindinga1.sources.r1.channels=c1a1.sinks.k1.channel=c1启动$>flume-ng agent -f r_spooldir -n a1 -Dflume.root.logger=INFO,console

实时收集

[r_exec.conf]#componenta1.sources=r1a1.sinks=k1a1.channels=c1a1.sources.r1.type=execa1.sources.r1.command=tail -F /home/centos/logs/1.txta1.sinks.k1.type=loggera1.channels.c1.type=memory#bindinga1.sources.r1.channels=c1a1.sinks.k1.channel=c1启动$>flume-ng agent -f /soft/flume/conf/r_exec.conf -n a1 -Dflume.root.logger=INFO,console

avro source

avro source启动的avro的socket server,接受avroclient发送来的avro事件。客户端需要通过flume-ng avro-client命令进行发送.[r_avro.conf]#componenta1.sources=r1a1.sinks=k1a1.channels=c1a1.sources.r1.type=avroa1.sources.r1.bind=localhosta1.sources.r1.port=8888a1.sinks.k1.type=loggera1.channels.c1.type=memory#bindinga1.sources.r1.channels=c1a1.sinks.k1.channel=c11.启动avro agent$>flume-ng agent -f /soft/flume/conf/r_avro.conf -n a1 -Dflume.root.logger=INFO,console2.通过avro client发送avro事件给avro source.  -H:avro source主机  -p:端口  -F:发送的文件,每个一个事件。 $>flume-ng avro-client -H localhost -p 8888 -F /home/centos/logs/1.txt

seq源:序列源

序列生成器,从0开始,步长1,用于测试。[r_seq.conf]#componenta1.sources=r1a1.sinks=k1a1.channels=c1a1.sources.r1.type=seqa1.sinks.k1.type=loggera1.channels.c1.type=memory#bindinga1.sources.r1.channels=c1a1.sinks.k1.channel=c1启动avro agent$>flume-ng agent -f /soft/flume/conf/r_seq.conf -n a1 -Dflume.root.logger=INFO,console

stress源:压力源

可用压测。可以指定发送事件的总数,还可以指定每个事件的数据量大小(byte[]).[r_stress.conf]#componenta1.sources=r1a1.sinks=k1a1.channels=c1a1.sources.r1.type=org.apache.flume.source.StressSourcea1.sources.r1.size=10240a1.sources.r1.maxTotalEvents=100a1.sinks.k1.type=loggera1.channels.c1.type=memory#bindinga1.sources.r1.channels=c1a1.sinks.k1.channel=c1启动avro agent$>flume-ng agent -f /soft/flume/conf/r_stress.conf -n a1 -Dflume.root.logger=INFO,console

sink

1.file_roll    存放数据在本地目录。    a1.sources=r1    a1.sinks=k1    a1.channels=c1    a1.sources.r1.type=netcat    a1.sources.r1.bind=localhost    a1.sources.r1.port=8888    a1.sinks.k1.type=file_roll    a1.sinks.k1.sink.directory=/home/centos/flume    a1.channels.c1.type=memory    a1.sources.r1.channels=c1    a1.sinks.k1.channel=c1

channle

1.memory    a1.sources=r1    a1.sinks=k1    a1.channels=c1    a1.sources.r1.type=seq    a1.channels.c1.type=memory    a1.channels.c1.capacity=100000    a1.channels.c1.transactionCapacity=100    a1.sinks.k1.type=logger    a1.sources.r1.channels=c1    a1.sinks.k1.channel=c1

hdfs sink

数据写入到hdfs文件系统。[k_hdfs.conf]a1.sources = r1a1.channels = c1a1.sinks = k1a1.sources.r1.type=seqa1.sources.r1.totalEvents=1000a1.channels.c1.type=memorya1.channels.c1.capacity=100000a1.channels.c1.transactionCapacity=100a1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.path = /user/centos/logs/%y-%m-%d/%H%M/%S#文件名前缀,生成的文件以时间戳命名.a1.sinks.k1.hdfs.filePrefix = events-#是否按照整时间段划分,否则每秒生成对应目录a1.sinks.k1.hdfs.round = truea1.sinks.k1.hdfs.roundValue = 10a1.sinks.k1.hdfs.roundUnit = second#按照时间间隔(秒数)a1.sinks.k1.hdfs.rollInterval= 300#按照字节数()a1.sinks.k1.hdfs.rollSize = 1024000#按照事件个数()a1.sinks.k1.hdfs.rollCount = 20#修改文件类型,默认sequencefile,可以DataStream,CompressedStreama1.sinks.k1.hdfs.fileType = DataStream#使用本地时间作为时间戳,不再从header中提取时间戳信息。a1.sinks.k1.hdfs.useLocalTimeStamp = truea1.sources.r1.channels = c1a1.sinks.k1.channel = c1启动$>flume-ng agent -f /soft/flume/conf/r_stress.conf -n a1 -Dflume.root.logger=INFO,console

sink数据到hbase

1.同步hbase导入(????,类库版本问题)    a1.sources = r1    a1.channels = c1    a1.sinks = k1    a1.sources.r1.type=seq    a1.sources.r1.totalEvents=1000    a1.channels.c1.type=memory    a1.channels.c1.capacity=100000    a1.channels.c1.transactionCapacity=100    a1.sinks.k1.type = hbase    a1.sinks.k1.table = ns1:t9    a1.sinks.k1.columnFamily = f1    a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer    a1.sources.r1.channels = c1    a1.sinks.k1.channel = c12.async导入hbase,ok.    a1.sources = r1    a1.channels = c1    a1.sinks = k1    a1.sources.r1.type=seq    a1.sources.r1.totalEvents=1000    a1.channels.c1.type=memory    a1.channels.c1.capacity=100000    a1.channels.c1.transactionCapacity=100    a1.sinks.k1.type = asynchbase    a1.sinks.k1.table = ns1:t9    a1.sinks.k1.columnFamily = f1    a1.sinks.k1.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer    a1.sources.r1.channels = c1    a1.sinks.k1.channel = c1

avro

基于schema,json格式,自我描述的串行化系统,跨语言.1.avro source    a1.sources = r1    a1.channels = c1    a1.sinks = k1    a1.sources.r1.type=avro    a1.sources.r1.bind=localhost    a1.sources.r1.port=8888    a1.channels.c1.type=memory    a1.channels.c1.capacity=100000    a1.channels.c1.transactionCapacity=100    a1.sinks.k1.type = logger    a1.sources.r1.channels = c1    a1.sinks.k1.channel = c12.启动agent    flume-ng agent -f r_avro.conf -n a1 -Dflume.root.logger=INFo,console3.通过avro-client命令发送数据    flume-ng avro-client -H localhost -p 8888 -F ~/1.txt 

使用avro source和avro sink组合实现

1.创建文件    a1.sources = r1    a1.channels = c1    a1.sinks = k1    a1.sources.r1.type=seq    a1.sources.r1.totalEvents=100    a1.channels.c1.type=memory    a1.channels.c1.capacity=100000    a1.channels.c1.transactionCapacity=100    a1.sinks.k1.type = avro    a1.sinks.k1.hostname = localhost    a1.sinks.k1.port = 8888    a1.sources.r1.channels = c1    a1.sinks.k1.channel = c1    a2.sources = r2    a2.channels = c2    a2.sinks = k2    a2.sources.r2.type=avro    a2.sources.r2.bind=localhost    a2.sources.r2.port=8888    a2.channels.c2.type=memory    a2.channels.c2.capacity=100000    a2.channels.c2.transactionCapacity=100    a2.sinks.k2.type = logger    a2.sources.r2.channels = c2    a2.sinks.k2.channel = c22.启动agent    2.1)启动后者        flume-ng agent -f avro-hop.conf -n a2 -Dflume.root.logger=INFO,console    2.2)启动前者        flume-ng agent -f avro-hop.conf -n a1

从source出来的event进入channel中,如果channel关联了多个,

可以使用channel selector进行控制

1.replicating方式选择策略    a1.sources = r1    a1.channels = c1 c2    a1.sinks = k1 k2    a1.sources.r1.type=seq    a1.sources.r1.totalEvents=100    a1.sources.r1.selector.type = replicating    #如果失败直接忽略,如果没有optional,需要事务故障处理    a1.sources.r1.selector.optional = c2    a1.channels.c1.type=memory    a1.channels.c1.capacity=100000    a1.channels.c1.transactionCapacity=100    a1.channels.c2.type=memory    a1.channels.c2.capacity=100000    a1.channels.c2.transactionCapacity=100    a1.sinks.k1.type=file_roll    a1.sinks.k1.sink.directory=/user/centos/flume/f1    a1.sinks.k2.type=file_roll    a1.sinks.k2.sink.directory=/user/centos/flume/f2    a1.sources.r1.channels = c1 c2    a1.sinks.k1.channel = c1    a1.sinks.k2.channel = c22.multiplexing通道选择策略    a1.sources = r1    a1.channels = c1 c2    a1.sinks = k1 k2    a1.sources.r1.type=avro    a1.sources.r1.bind=localhost    a1.sources.r1.port=8888    a1.sources.r1.selector.type = multiplexing    a1.sources.r1.selector.header = city    a1.sources.r1.selector.mapping.bd = c1    a1.sources.r1.selector.mapping.sjz = c2    a1.sources.r1.selector.default = c1    a1.channels.c1.type=memory    a1.channels.c2.type=memory    a1.sinks.k1.type=file_roll    a1.sinks.k1.sink.directory=/user/centos/flume/f1    a1.sinks.k2.type=file_roll    a1.sinks.k2.sink.directory=/user/centos/flume/f2    a1.sources.r1.channels = c1 c2    a1.sinks.k1.channel = c1    a1.sinks.k2.channel = c2

flume

日志收集工具。

agent

1.source    input    exec   tail -F    netcat     avro    spooldir    seq             //序列源    stress          //压力    channelSelector2.channel    buffer    memory          //    file            //文件3.sink    out    hdfs    hbase    logger    avro

sinkprocessor

0.sink处理器    针对sink group而言的,1.failover    容灾sink处理器,维护了一组sink列表,只有事件可用,就能处理。    选择有限级高的sink处理所有事件,直到sink故障,再选择优先级低的sink继续处理。    没有负载平衡处理。只是做到容灾。    a1.sources = r1    a1.channels = c1    a1.sinkgroups = g1    a1.sinks = k1 k2    a1.sources.r1.type=netcat    a1.sources.r1.bind=localhost    a1.sources.r1.port=8888    a1.channels.c1.type = memory    a1.channels.c1.capacity = 100000    a1.channels.c1.transactionCapacity = 100    a1.sinks.k1.type=file_roll    a1.sinks.k1.sink.directory=/home/centos/flume/f1    a1.sinks.k2.type=file_roll    a1.sinks.k2.sink.directory=/home/centos/flume/f2    a1.sinkgroups.g1.sinks = k1 k2    a1.sinkgroups.g1.processor.type = failover    a1.sinkgroups.g1.processor.priority.k1 = 1    a1.sinkgroups.g1.processor.priority.k2 = 2    a1.sinkgroups.g1.processor.maxpenalty = 10000    a1.sources.r1.channels=c1    a1.sinks.k1.channel=c1    a1.sinks.k2.channel=c12.Load balancing    负载平衡,使用轮询或随机方式寻找下一个sink进行处理,如果失败再    继续进行选择,可以实现容灾可负载平衡机制。    a1.sources = r1    a1.channels = c1    a1.sinkgroups = g1    a1.sinks = k1 k2    a1.sources.r1.type=netcat    a1.sources.r1.bind=localhost    a1.sources.r1.port=8888    a1.channels.c1.type = memory    a1.channels.c1.capacity = 100000    a1.channels.c1.transactionCapacity = 100    a1.sinks.k1.type=file_roll    a1.sinks.k1.sink.directory=/home/centos/flume/f1    a1.sinks.k2.type=file_roll    a1.sinks.k2.sink.directory=/home/centos/flume/f2    a1.sinkgroups.g1.sinks = k1 k2    a1.sinkgroups.g1.processor.type = load_balance    a1.sinkgroups.g1.processor.backoff = true    a1.sinkgroups.g1.processor.selector = round_robin    a1.sources.r1.channels=c1    a1.sinks.k1.channel=c1    a1.sinks.k2.channel=c1

channel

1.MemoryChannel    速率高,故障后数据有丢失。    a1.sources = r1    a1.channels = c1    a1.sinks = k1    a1.sources.r1.type=seq    a1.sources.r1.totalEvents=10000    a1.channels.c1.type = memory    a1.sinks.k1.type=logger    a1.sources.r1.channels=c1    a1.sinks.k1.channel=c12.文件通道    数据写入到磁盘,可靠性,速率低。容灾.    a1.sources = r1    a1.channels = c1    a1.sinks = k1    a1.sources.r1.type=seq    a1.sources.r1.totalEvents=10000    a1.channels.c1.type = file    a1.channels.c1.checkpointDir = /home/centos/flume/chk    a1.channels.c1.dataDirs = /home/centos/flume/data    a1.sinks.k1.type=logger    a1.sources.r1.channels=c1    a1.sinks.k1.channel=c13.Spillable Memory Channe    事件在内存和磁盘均存放,实验性的,不推荐生产使用。    memoryCapacity          //内存容量    overflowCapacity        //文件容量    a1.sources = r1    a1.channels = c1    a1.sinks = k1    a1.sources.r1.type=org.apache.flume.source.StressSource    a1.sources.r1.size=102400    a1.sources.r1.maxTotalEvents=20000    //    a1.channels.c1.type = SPILLABLEMEMORY    //该值=0,意味着禁用内存通道。    a1.channels.c1.memoryCapacity = 10000    //相当于禁用fileChannel    a1.channels.c1.overflowCapacity = 0    a1.channels.c1.byteCapacity = 800000    a1.channels.c1.checkpointDir = /home/centos/flume/chk    a1.channels.c1.dataDirs = /home/centos/flume/data    a1.sinks.k1.type=logger    a1.sources.r1.channels=c1    a1.sinks.k1.channel=c1

拦截器

a1.sources = r1a1.channels = c1a1.sources.r1.channels =  c1a1.sources.r1.type = seqa1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = statica1.sources.r1.interceptors.i1.key = datacentera1.sources.r1.interceptors.i1.value = NEW_YORK

限速

限制source数据生成速度。自定义拦截器实现。 0.pom.xml    <?xml version="1.0" encoding="UTF-8"?>    <project xmlns="http://maven.apache.org/POM/4.0.0"             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">        <modelVersion>4.0.0</modelVersion>        <groupId>com.it18zhang</groupId>        <artifactId>my-flume</artifactId>        <version>1.0-SNAPSHOT</version>        <dependencies>            <dependency>                <groupId>org.apache.flume</groupId>                <artifactId>flume-ng-core</artifactId>                <version>1.6.0</version>            </dependency>        </dependencies>    </project>1.编写拦截器类
package com.it18zhang.flume.interceptor;        import org.apache.flume.Context;        import org.apache.flume.Event;        import org.apache.flume.interceptor.Interceptor;        import java.io.FileOutputStream;        import java.util.List;        /**         * 限速拦截器         */        public class LimitSpeedInterceptor2 implements Interceptor{            //最后一次发送事件时间            private long lastSendNano = 0 ;            //上次发送事件的长度(body,不计算header信息)            private long lastEventLength = 0;            //发送事件的上限速率(10k/s)            private static long maxRate = 1024 ;            public void initialize() {            }            public Event intercept(Event event) {                System.out.println("====================================");                //不是首次                if (lastSendNano != 0){                    //毫秒                    long nowNano = System.nanoTime() ;                    //发送上次事件花的时间值                    long ellapseNano = nowNano - lastSendNano;                    //需要花费的时间毫秒数                    long needNano= (long)((float)lastEventLength / maxRate  * 1000 * 1000);                    System.out.printf("now:%d , ella:%d , needMi:%d\r\n" , nowNano , ellapseNano,needNano) ;                    //                    if(needNano > ellapseNano){                        try {                            System.out.printf("sleep : %d\r\n" , (needNano - ellapseNano) / 1000);                            int nano0 = (int)((needNano - ellapseNano) % 1000000 );                            Thread.sleep((needNano - ellapseNano) / 1000 , nano0);                        } catch (InterruptedException e) {                            e.printStackTrace();                        }                    }                }                this.lastEventLength = event.getBody().length;                this.lastSendNano = System.nanoTime();                System.out.printf("lastEventLength:%d , lastSendNano:%d\r\n", lastEventLength, lastSendNano);                return event;            }            public List<Event> intercept(List<Event> events) {                for(Event e : events){                    intercept(e) ;                }                return events;            }            public void close() {            }            public static class Builder implements Interceptor.Builder {                private static final String MAX_RATE= "maxRatePerSecond" ;                public Interceptor build() {                    return new LimitSpeedInterceptor2() ;                }                public void configure(Context context) {                    //LimitSpeedInterceptor2.maxRate = context.getLong(MAX_RATE , 10 * 1024L) ;                    System.out.println(" conf== " + LimitSpeedInterceptor2.maxRate);                }            }            private void outFile(String log){                try {                    FileOutputStream fos = new FileOutputStream("/home/centos/i.log",true);                    fos.write((log + "\r\n").getBytes());                    fos.flush();                    fos.close();                } catch (Exception e) {                    e.printStackTrace();                }            }        }
2.导出jar包3.部署jar到flume的classpath下。    flume/lib/xxx4.配置自定义拦截器    a1.sources = r1    a1.channels = c1    a1.sinks = k1    a1.sources.r1.type=org.apache.flume.source.StressSource    a1.sources.r1.size=10240    a1.sources.r1.maxTotalEvents=10000    a1.sources.r1.interceptors = i1    #配置的是builder    a1.sources.r1.interceptors.i1.type = com.it18zhang.flume.interceptor.LimitSpeedInterceptor$Builder    a1.sources.avroSrc.interceptors.i1.limitRate = 20    a1.sources.avroSrc.interceptors.i1.headerSize = 10    a1.channels.c1.type = memory    a1.channels.c1.capacity = 100000    a1.channels.c1.transactionCapacity = 100    a1.sinks.k1.type=logger    a1.sources.r1.channels=c1    a1.sinks.k1.channel=c1

more details

原创粉丝点击