flume-ng interceptors

来源:互联网 发布:软件开发工程师职称 编辑:程序博客网 时间:2024/06/05 18:16


flume-ng  interceptors 可以理解为一个过滤器,通过配置可以收集到符合自己需要类型的日志

官网提供了以下几种interceptors:

  Timestamp Interceptor

在event的header中添加一个key叫:timestamp,value为当前的时间戳

Example for agent named a1:

a1.sources = r1a1.channels = c1a1.sources.r1.channels =  c1a1.sources.r1.type = seqa1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = timestamp

  Host Interceptor

在event的header中添加一个key叫:host,value为当前机器的hostname或者ip

Example for agent named a1:

a1.sources = r1a1.channels = c1a1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = hosta1.sources.r1.interceptors.i1.hostHeader = hostname

  Static Interceptor

可以在event的header中添加自定义的key和value。

Example for agent named a1:

a1.sources = r1a1.channels = c1a1.sources.r1.channels =  c1a1.sources.r1.type = seqa1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = statica1.sources.r1.interceptors.i1.key = datacentera1.sources.r1.interceptors.i1.value = NEW_YORK

  UUID Interceptor



  Morphline Interceptor


Sample flume.conf file:

a1.sources.avroSrc.interceptors = morphlineinterceptora1.sources.avroSrc.interceptors.morphlineinterceptor.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Buildera1.sources.avroSrc.interceptors.morphlineinterceptor.morphlineFile = /etc/flume-ng/conf/morphline.confa1.sources.avroSrc.interceptors.morphlineinterceptor.morphlineId = morphline1


  Search and Replace Interceptor


Example configuration:

a1.sources.avroSrc.interceptors = search-replacea1.sources.avroSrc.interceptors.search-replace.type = search_replace# Remove leading alphanumeric characters in an event body.a1.sources.avroSrc.interceptors.search-replace.searchPattern = ^[A-Za-z0-9_]+a1.sources.avroSrc.interceptors.search-replace.replaceString =

Another example:

a1.sources.avroSrc.interceptors = search-replacea1.sources.avroSrc.interceptors.search-replace.type = search_replace# Use grouping operators to reorder and munge words on a line.a1.sources.avroSrc.interceptors.search-replace.searchPattern = The quick brown ([a-z]+) jumped over the lazy ([a-z]+)a1.sources.avroSrc.interceptors.search-replace.replaceString = The hungry $2 ate the careless $1


  Regex Filtering Interceptor

通过正则来清洗或包含匹配的events。



  Regex Extractor Intercepto


通过正则表达式来在header中添加指定的key,value则为正则匹配的部分


  Example 1:

If the Flume event body contained 1:2:3.4foobar5 and the following configuration was used

a1.sources.r1.interceptors.i1.regex = (\\d):(\\d):(\\d)a1.sources.r1.interceptors.i1.serializers = s1 s2 s3a1.sources.r1.interceptors.i1.serializers.s1.name = onea1.sources.r1.interceptors.i1.serializers.s2.name = twoa1.sources.r1.interceptors.i1.serializers.s3.name = three

The extracted event will contain the same body but the following headers will have been added one=>1, two=>2, three=>3

  Example 2:

If the Flume event body contained 2012-10-18 18:47:57,614 some log line and the following configuration was used

a1.sources.r1.interceptors.i1.regex = ^(?:\\n)?(\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d)a1.sources.r1.interceptors.i1.serializers = s1a1.sources.r1.interceptors.i1.serializers.s1.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializera1.sources.r1.interceptors.i1.serializers.s1.name = timestampa1.sources.r1.interceptors.i1.serializers.s1.pattern = yyyy-MM-dd HH:mm

上面是官网提供的demo:

这里我简单介绍一下经常使用的 regex_filter

在配置中新增interceptors:

下面是给了两个正则 ,作为两个例子进行实现  其中 i1 是 匹配正则  i2 是匹配类似 d:d:d 格式的日志

a1.sources.source1.interceptors=i2a1.sources.source1.interceptors.i1.type=regex_filter  a1.sources.source1.interceptors.i1.regex=\\{.*\\}  a1.sources.source1.interceptors.i2.type=regex_filter  a1.sources.source1.interceptors.i2.regex = (\\d):(\\d):(\\d)a1.sources.source1.interceptors.i2.serializers = s1 s2 s3a1.sources.source1.interceptors.i2.serializers.s1.name = onea1.sources.source1.interceptors.i2.serializers.s2.name = twoa1.sources.source1.interceptors.i2.serializers.s3.name = three


其他的配置请看上一篇日志 http://blog.csdn.net/linlinv3/article/details/50053333;

 在上一篇的日志中加入 interceptors;

若是不加interceptors 采集到的日志是 WriteLog所有的:


若是加上i1后 日志只有 1:2:3 


若是使用i2 则日志只有 那串json串


0 0
原创粉丝点击