使用flume问题总结3——一个使用flume拦截器和选择器的简单实例
来源:互联网 发布:通联数据股份公司待遇 编辑:程序博客网 时间:2024/05/21 09:22
flume配置文件示例:
//flume source选用syslog TCP sourceproducer.sources = syslogSource//For each one of the sources, the type is definedproducer.sources.syslogSource.type = syslogtcpproducer.sources.syslogSource.bind = localhostproducer.sources.syslogSource.port = 5496//官方文档中介绍选择器时未指明仍然需要配置这一项,但不配置时报错producer.sources.syslogSource.channels = cbaidu csina cgoogle //类型为正则表达式提取拦截器(regex_extractor),//即从event正文中提取信息加至header,配合下面的channel selector就可以进行event路由了//the interceptors:domain nameproducer.sources.syslogSource.interceptors=domainnameproducer.sources.syslogSource.interceptors.domainname.type=regex_extractorproducer.sources.syslogSource.interceptors.domainname.regex = YM:(\\w+)//从正文中提取域名,作为一个新的header//key为domain_name,value为为对应的域名信息,添加至event//only add one headerproducer.sources.syslogSource.interceptors.domainname.serializers = s1 producer.sources.syslogSource.interceptors.domainname.serializers.s1.name = domain_nameproducer.sources.syslogSource.interceptors.domainname.serializers.s2.type = default//类型:多路复用选择器multiplexing,即依据指定的header值,将event路由至不同的channel//selectorproducer.sources.syslogSource.selector.type = multiplexingproducer.sources.syslogSource.selector.header = domain_name//哪一个headerproducer.sources.syslogSource.selector.mapping.baidu = cbaidu//不同的header值路由至不同的channelproducer.sources.syslogSource.selector.mapping.sina = csinaproducer.sources.syslogSource.selector.mapping.google = cgoogleproducer.sources.syslogSource.selector.default = cbaidu// Each channel's type is defined.//3个channel均是内存producer.channels = cbaidu csina cgoogleproducer.channels.cbaidu.type = memoryproducer.channels.csina.type = memoryproducer.channels.cgoogle.type = memoryproducer.channels.cbaidu.capacity = 1000producer.channels.csina.capacity = 1000producer.channels.cgoogle.capacity = 1000// Each sink's type must be defined//3个不同的sink,其中两个为kafka,一个是File Rollproducer.sinks = sbaidu ssina sgoogleproducer.sinks.sbaidu.type = org.apache.flume.sink.kafka.KafkaSinkproducer.sinks.sbaidu.channel = cbaiduproducer.sinks.sbaidu.topic = baiduproducer.sinks.sbaidu.brokerList = localhost:9797producer.sinks.sbaidu.requiredAcks = 1producer.sinks.sbaidu.batchSize = 20producer.sinks.sbaidu.metadata.broker.list = localhost:9092producer.sinks.sbaidu.producer.type=syncproducer.sinks.sbaidu.serializer.class=kafka.serializer.DefaultEncoderproducer.sinks.ssina.type = file_roll producer.sinks.ssina.channel = csinaproducer.sinks.ssina.sink.directory = /usr/local/flume/result producer.sinks.ssina.sink.rollInterval = 0 producer.sinks.ssina.sink.serializer = avro_eventproducer.sinks.sgoogle.type = org.apache.flume.sink.kafka.KafkaSinkproducer.sinks.sgoogle.channel = cgoogleproducer.sinks.sgoogle.topic = googleproducer.sinks.sgoogle.brokerList = localhost:9898producer.sinks.sgoogle.requiredAcks = 1producer.sinks.sgoogle.batchSize = 20producer.sinks.sgoogle.metadata.broker.list = localhost:9092producer.sinks.sgoogle.producer.type=syncproducer.sinks.sgoogle.serializer.class=kafka.serializer.DefaultEncoder
说明:flume source:syslogtcp、3个memory channel、3个sink(一个file roll,两个kafka sink)
提取event中的域名字段,生成新的header(domain_name),根据此header将消息路由至不同的sink
event举例:YM:baidu YDIP:63.12.79.4
//flume source选用syslog TCP sourceproducer.sources = syslogSource//For each one of the sources, the type is definedproducer.sources.syslogSource.type = syslogtcpproducer.sources.syslogSource.bind = localhostproducer.sources.syslogSource.port = 5496//官方文档中介绍选择器时未指明仍然需要配置这一项,但不配置时报错producer.sources.syslogSource.channels = cbaidu csina cgoogle //类型为正则表达式提取拦截器(regex_extractor),//即从event正文中提取信息加至header,配合下面的channel selector就可以进行event路由了//the interceptors:domain nameproducer.sources.syslogSource.interceptors=domainnameproducer.sources.syslogSource.interceptors.domainname.type=regex_extractorproducer.sources.syslogSource.interceptors.domainname.regex = YM:(\\w+)//从正文中提取域名,作为一个新的header//key为domain_name,value为为对应的域名信息,添加至event//only add one headerproducer.sources.syslogSource.interceptors.domainname.serializers = s1 producer.sources.syslogSource.interceptors.domainname.serializers.s1.name = domain_nameproducer.sources.syslogSource.interceptors.domainname.serializers.s2.type = default//类型:多路复用选择器multiplexing,即依据指定的header值,将event路由至不同的channel//selectorproducer.sources.syslogSource.selector.type = multiplexingproducer.sources.syslogSource.selector.header = domain_name//哪一个headerproducer.sources.syslogSource.selector.mapping.baidu = cbaidu//不同的header值路由至不同的channelproducer.sources.syslogSource.selector.mapping.sina = csinaproducer.sources.syslogSource.selector.mapping.google = cgoogleproducer.sources.syslogSource.selector.default = cbaidu// Each channel's type is defined.//3个channel均是内存producer.channels = cbaidu csina cgoogleproducer.channels.cbaidu.type = memoryproducer.channels.csina.type = memoryproducer.channels.cgoogle.type = memoryproducer.channels.cbaidu.capacity = 1000producer.channels.csina.capacity = 1000producer.channels.cgoogle.capacity = 1000// Each sink's type must be defined//3个不同的sink,其中两个为kafka,一个是File Rollproducer.sinks = sbaidu ssina sgoogleproducer.sinks.sbaidu.type = org.apache.flume.sink.kafka.KafkaSinkproducer.sinks.sbaidu.channel = cbaiduproducer.sinks.sbaidu.topic = baiduproducer.sinks.sbaidu.brokerList = localhost:9797producer.sinks.sbaidu.requiredAcks = 1producer.sinks.sbaidu.batchSize = 20producer.sinks.sbaidu.metadata.broker.list = localhost:9092producer.sinks.sbaidu.producer.type=syncproducer.sinks.sbaidu.serializer.class=kafka.serializer.DefaultEncoderproducer.sinks.ssina.type = file_roll producer.sinks.ssina.channel = csinaproducer.sinks.ssina.sink.directory = /usr/local/flume/result producer.sinks.ssina.sink.rollInterval = 0 producer.sinks.ssina.sink.serializer = avro_eventproducer.sinks.sgoogle.type = org.apache.flume.sink.kafka.KafkaSinkproducer.sinks.sgoogle.channel = cgoogleproducer.sinks.sgoogle.topic = googleproducer.sinks.sgoogle.brokerList = localhost:9898producer.sinks.sgoogle.requiredAcks = 1producer.sinks.sgoogle.batchSize = 20producer.sinks.sgoogle.metadata.broker.list = localhost:9092producer.sinks.sgoogle.producer.type=syncproducer.sinks.sgoogle.serializer.class=kafka.serializer.DefaultEncoder
说明:flume source:syslogtcp、3个memory channel、3个sink(一个file roll,两个kafka sink)
提取event中的域名字段,生成新的header(domain_name),根据此header将消息路由至不同的sink
event举例:YM:baidu YDIP:63.12.79.4
0 0
- 使用flume问题总结3——一个使用flume拦截器和选择器的简单实例
- 使用flume问题总结2——flume event的简单理解
- flume拦截器使用
- Flume使用-问题总结
- flume的简单使用
- flume的安装和简单使用
- 使用flume问题总结1——搭建flume+测试Syslog source
- flume-ng的简单使用
- flume串联的简单使用
- flume的原理和使用
- flume的原理和使用
- flume的原理和使用
- flume的安装和使用
- 简单的项目使用flume,hive,sqoop,flume
- Hadoop详解(七)——Hive的原理和安装配置和UDF,flume的安装和配置以及简单使用,flume+hive+Hadoop进行日志处理
- Flume 原理和使用
- Flume的安装与简单使用
- flume学习:Flume Interceptors的使用
- 加载一张bmp图片作为MFC对话框的背景
- Linux Performance Tools
- android 学习之基础篇一
- C/C++---printf/cout 从右至左压栈顺序实例详解
- Select网络模型
- 使用flume问题总结3——一个使用flume拦截器和选择器的简单实例
- 雅虎34条性能法则
- data block转储文件初识
- Krisch Compass Mask(各个方向的边缘)
- 图像处理&识别:应用
- linux下登录mysql
- 重构想法1
- 萧条时期的SEO要如何着手?
- 2.Python标准库_ 时间与日期 (time, datetime包)