Flume 配置文件概述
来源:互联网 发布:mysql 查看权限 编辑:程序博客网 时间:2024/05/21 08:53
flume Overview
Alias Conventions
Component Summary
Sources
- avro source
a1.sources = r1a1.channels = c1a1.sources.r1.type = avroa1.sources.r1.channels = c1a1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 4141
- Thrift source
a1.sources = r1a1.channels = c1a1.sources.r1.type = thrifta1.sources.r1.channels = c1a1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 4141
- Exec Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = execa1.sources.r1.command = tail -F /var/log/securea1.sources.r1.channels = c1
- JMS Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = jmsa1.sources.r1.channels = c1a1.sources.r1.initialContextFactory = org.apache.activemq.jndi.ActiveMQInitialContextFactorya1.sources.r1.connectionFactory = GenericConnectionFactorya1.sources.r1.providerURL = tcp://mqserver:61616a1.sources.r1.destinationName = BUSINESS_DATAa1.sources.r1.destinationType = QUEUE
- Spooling Directory Source
a1.channels = ch-1a1.sources = src-1a1.sources.src-1.type = spooldira1.sources.src-1.channels = ch-1a1.sources.src-1.spoolDir = /var/log/apache/flumeSpoola1.sources.src-1.fileHeader = true
- Kafka Source
tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSourcetier1.sources.source1.channels = channel1tier1.sources.source1.batchSize = 5000tier1.sources.source1.batchDurationMillis = 2000tier1.sources.source1.kafka.bootstrap.servers = localhost:9092tier1.sources.source1.kafka.topics = test1, test2tier1.sources.source1.kafka.consumer.group.id = custom.g.idortier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSourcetier1.sources.source1.channels = channel1tier1.sources.source1.kafka.bootstrap.servers = localhost:9092tier1.sources.source1.kafka.topics.regex = ^topic[0-9]$# the default kafka.consumer.group.id=flume is used
- NetCat UDP Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = netcatudpa1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 6666a1.sources.r1.channels = c1
- Sequence Generator Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = seqa1.sources.r1.channels = c1
- Syslog TCP Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = syslogtcpa1.sources.r1.port = 5140a1.sources.r1.host = localhosta1.sources.r1.channels = c1
- Custom Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = org.example.MySourcea1.sources.r1.channels = c1
Sinks
- HDFS Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = hdfsa1.sinks.k1.channel = c1a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%Sa1.sinks.k1.hdfs.filePrefix = events-a1.sinks.k1.hdfs.round = truea1.sinks.k1.hdfs.roundValue = 10a1.sinks.k1.hdfs.roundUnit = minute# an event with timestamp 11:54:34 AM, June 12, 2012 will cause the hdfs path to become /flume/events/2012-06-12/1150/00.
- Hive Sink
a1.channels = c1a1.channels.c1.type = memorya1.sinks = k1a1.sinks.k1.type = hivea1.sinks.k1.channel = c1a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083a1.sinks.k1.hive.database = logsdba1.sinks.k1.hive.table = weblogsa1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%Ma1.sinks.k1.useLocalTimeStamp = falsea1.sinks.k1.round = truea1.sinks.k1.roundValue = 10a1.sinks.k1.roundUnit = minutea1.sinks.k1.serializer = DELIMITEDa1.sinks.k1.serializer.delimiter = "\t"a1.sinks.k1.serializer.serdeSeparator = '\t'a1.sinks.k1.serializer.fieldnames =id,,msg
- Logger Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = loggera1.sinks.k1.channel = c1
- Avro Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = avroa1.sinks.k1.channel = c1a1.sinks.k1.hostname = 10.10.10.10a1.sinks.k1.port = 4545
- Thrift Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = thrifta1.sinks.k1.channel = c1a1.sinks.k1.hostname = 10.10.10.10a1.sinks.k1.port = 4545
- HBaseSink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = hbasea1.sinks.k1.table = foo_tablea1.sinks.k1.columnFamily = bar_cfa1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializera1.sinks.k1.channel = c1
- KafkaSink
a1.sinks.k1.channel = c1a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSinka1.sinks.k1.kafka.topic = mytopica1.sinks.k1.kafka.bootstrap.servers = localhost:9092a1.sinks.k1.kafka.flumeBatchSize = 20a1.sinks.k1.kafka.producer.acks = 1a1.sinks.k1.kafka.producer.linger.ms = 1a1.sinks.k1.kafka.producer.compression.type = snappy
- HTTP Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = httpa1.sinks.k1.channel = c1a1.sinks.k1.endpoint = http://localhost:8080/someuria1.sinks.k1.connectTimeout = 2000a1.sinks.k1.requestTimeout = 2000a1.sinks.k1.acceptHeader = application/jsona1.sinks.k1.contentTypeHeader = application/jsona1.sinks.k1.defaultBackoff = truea1.sinks.k1.defaultRollback = truea1.sinks.k1.defaultIncrementMetrics = falsea1.sinks.k1.backoff.4XX = falsea1.sinks.k1.rollback.4XX = falsea1.sinks.k1.incrementMetrics.4XX = truea1.sinks.k1.backoff.200 = falsea1.sinks.k1.rollback.200 = falsea1.sinks.k1.incrementMetrics.200 = true
- Custom Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = org.example.MySinka1.sinks.k1.channel = c1
Flume Channels
Channels are the repositories where the events are staged on a agent. Source adds the events and Sink removes it.
- Memory Channel
a1.channels = c1a1.channels.c1.type = memorya1.channels.c1.capacity = 10000a1.channels.c1.transactionCapacity = 10000a1.channels.c1.byteCapacityBufferPercentage = 20a1.channels.c1.byteCapacity = 800000
- JDBC Channel
a1.channels = c1a1.channels.c1.type = jdbc
- Kafka Channel
a1.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannela1.channels.channel1.kafka.bootstrap.servers = kafka-1:9092,kafka-2:9092,kafka-3:9092a1.channels.channel1.kafka.topic = channel1a1.channels.channel1.kafka.consumer.group.id = flume-consumer
- File Channel
a1.channels = c1a1.channels.c1.type = filea1.channels.c1.checkpointDir = /mnt/flume/checkpointa1.channels.c1.dataDirs = /mnt/flume/data
Flume Channel Selectors
If the type is not specified, then defaults to “replicating”.
- Replicating Channel Selector (default)
a1.sources = r1a1.channels = c1 c2 c3a1.sources.r1.selector.type = replicatinga1.sources.r1.channels = c1 c2 c3a1.sources.r1.selector.optional = c3tips:In the above configuration, c3 is an optional channel. Failure to write to c3 is simply ignored. Since c1 and c2 are not marked optional, failure to write to those channels will cause the transaction to fail.
- Multiplexing Channel Selector
a1.sources = r1a1.channels = c1 c2 c3 c4a1.sources.r1.selector.type = multiplexinga1.sources.r1.selector.header = statea1.sources.r1.selector.mapping.CZ = c1a1.sources.r1.selector.mapping.US = c2 c3a1.sources.r1.selector.default = c4
- Custom Channel Selector
A custom channel selector is your own implementation of the ChannelSelector interface. A custom channel selector’s class and its dependencies must be included in the agent’s classpath when starting the Flume agent. The type of the custom channel selector is its FQCN.
a1.sources = r1a1.channels = c1a1.sources.r1.selector.type = org.example.MyChannelSelector
Flume Sink Processors
Sink groups allow users to group multiple sinks into one entity. Sink processors can be used to provide load balancing capabilities over all sinks inside the group or to achieve fail over from one sink to another in case of temporal failure.
a1.sinkgroups = g1a1.sinkgroups.g1.sinks = k1 k2a1.sinkgroups.g1.processor.type = load_balance
- Default Sink Processor
Default sink processor accepts only a single sink. User is not forced to create processor (sink group) for single sinks. Instead user can follow the source - channel - sink pattern that was explained above in this user guide.
- Failover Sink Processor
a1.sinkgroups = g1a1.sinkgroups.g1.sinks = k1 k2a1.sinkgroups.g1.processor.type = failovera1.sinkgroups.g1.processor.priority.k1 = 5a1.sinkgroups.g1.processor.priority.k2 = 10a1.sinkgroups.g1.processor.maxpenalty = 10000
- Load balancing Sink Processor
a1.sinkgroups = g1a1.sinkgroups.g1.sinks = k1 k2a1.sinkgroups.g1.processor.type = load_balancea1.sinkgroups.g1.processor.backoff = truea1.sinkgroups.g1.processor.selector = random
- Custom Sink Processor
Custom sink processors are not supported at the moment.
Event Serializers
The file_roll sink and the hdfs sink both support the EventSerializer interface. Details of the EventSerializers that ship with Flume are provided below.
- Body Text Serializer
a1.sinks = k1a1.sinks.k1.type = file_rolla1.sinks.k1.channel = c1a1.sinks.k1.sink.directory = /var/log/flumea1.sinks.k1.sink.serializer = texta1.sinks.k1.sink.serializer.appendNewline = false
- “Flume Event” Avro Event Serializer
a1.sinks.k1.type = hdfsa1.sinks.k1.channel = c1a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%Sa1.sinks.k1.serializer = avro_eventa1.sinks.k1.serializer.compressionCodec = snappy
- Avro Event Serializer
a1.sinks.k1.type = hdfsa1.sinks.k1.channel = c1a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%Sa1.sinks.k1.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Buildera1.sinks.k1.serializer.compressionCodec = snappya1.sinks.k1.serializer.schemaURL = hdfs://namenode/path/to/schema.avsc
Flume Interceptors
a1.sources = r1a1.sinks = k1a1.channels = c1a1.sources.r1.interceptors = i1 i2a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.HostInterceptor$Buildera1.sources.r1.interceptors.i1.preserveExisting = falsea1.sources.r1.interceptors.i1.hostHeader = hostnamea1.sources.r1.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Buildera1.sinks.k1.filePrefix = FlumeData.%{CollectorHost}.%Y-%m-%da1.sinks.k1.channel = c1
- Timestamp Interceptor
This interceptor inserts into the event headers, the time in millis at which it processes the event. This interceptor inserts a header with key timestamp (or as specified by the header property) whose value is the relevant timestamp. This interceptor can preserve an existing timestamp if it is already present in the configuration.
a1.sources = r1a1.channels = c1a1.sources.r1.channels = c1a1.sources.r1.type = seqa1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = timestamp
- Host Interceptor
This interceptor inserts the hostname or IP address of the host that this agent is running on. It inserts a header with key host or a configured key whose value is the hostname or IP address of the host, based on configuration.
a1.sources = r1a1.channels = c1a1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = host
- Static Interceptor
a1.sources = r1a1.channels = c1a1.sources.r1.channels = c1a1.sources.r1.type = seqa1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = statica1.sources.r1.interceptors.i1.key = datacentera1.sources.r1.interceptors.i1.value = NEW_YORK
- Remove Header Interceptor
This interceptor manipulates Flume event headers, by removing one or many headers. It can remove a statically defined header, headers based on a regular expression or headers in a list. If none of these is defined, or if no header matches the criteria, the Flume events are not modified.
- UUID Interceptor
This interceptor sets a universally unique identifier on all events that are intercepted. An example UUID is b5755073-77a9-43c1-8fad-b7a586fc1b97, which represents a 128-bit value.
- Regex Filtering Interceptor
This interceptor filters events selectively by interpreting the event body as text and matching the text against a configured regular expression. The supplied regular expression can be used to include events or exclude events.
- Regex Extractor Interceptor
If the Flume event body contained 1:2:3.4foobar5 and the following configuration was used:
a1.sources.r1.interceptors.i1.regex = (\\d):(\\d):(\\d)a1.sources.r1.interceptors.i1.serializers = s1 s2 s3a1.sources.r1.interceptors.i1.serializers.s1.name = onea1.sources.r1.interceptors.i1.serializers.s2.name = twoa1.sources.r1.interceptors.i1.serializers.s3.name = three
- Flume 配置文件概述
- Flume概述
- flume 概述
- Flume概述
- flume配置文件
- Flume配置文件
- Flume配置文件(flume-site.conf)
- Flume基础概述
- flume 配置文件信息
- flume配置文件example
- flume properties配置文件详解
- ubuntu安装flume,修改 Flume 配置文件
- Flume概述—报表数据流
- Flume 概述架构及部署
- flume核心配置文件解释说明
- Flume学习6_flume配置文件
- flume link kafka的配置文件
- Flume 1.7组件概述与列表
- mac上文字识别(Tesseract-OCR for mac )
- 隐马尔可夫模型,最大熵模型,最大熵马尔可夫模型与条件随机场的比较
- HTML DOM 级别以及一些小坑
- IntelliJ IDEA 13 无法正常使用SVN的问题和解决办法
- C#开发奇技淫巧三:把dll放在不同的目录让你的程序更整洁
- Flume 配置文件概述
- USACO 4.1.3 篱笆回路 (floyd找最小环)
- JAVA基础学习20171024-常量与变量
- centos7 mariaDb5.5 升级到最新版本
- 15算法课程 121. Best Time to Buy and Sell Stock
- Java四大权限修饰符
- java开发规范和优化总结
- 由Combination Sum I II 不懂的地方
- ABBYY finereader 12 激活码-破解版-注册机