Flume 配置文件概述

来源：互联网发布：mysql 查看权限编辑：程序博客网时间：2024/05/21 08:53

flume Overview

Alias Conventions

Alias Name Alias Type a agent c channel r source k sink g sink group i interceptor y key h host s serializer

Component Summary

Component Interface Type Alias Implementation Class org.apache.flume.Source avro org.apache.flume.source.AvroSource org.apache.flume.Source netcat org.apache.flume.source.NetcatSource org.apache.flume.Source seq org.apache.flume.source.SequenceGeneratorSource org.apache.flume.Source exec org.apache.flume.source.ExecSource org.apache.flume.Source syslogtcp org.apache.flume.source.SyslogTcpSource org.apache.flume.Source multiport_syslogtcp org.apache.flume.source.MultiportSyslogTCPSource org.apache.flume.Source syslogudp org.apache.flume.source.SyslogUDPSource org.apache.flume.Source spooldir org.apache.flume.source.SpoolDirectorySource org.apache.flume.Source http org.apache.flume.source.http.HTTPSource org.apache.flume.Source thrift org.apache.flume.source.ThriftSource org.apache.flume.Source jms org.apache.flume.source.jms.JMSSource org.apache.flume.Source custom org.example.MySource org.apache.flume.Channel memory org.apache.flume.channel.MemoryChannel org.apache.flume.Channel jdbc org.apache.flume.channel.jdbc.JdbcChannel org.apache.flume.Channel file org.apache.flume.channel.file.FileChannel org.apache.flume.Channel custom-class org.example.MyChannel org.apache.flume.Sink null org.apache.flume.sink.NullSink org.apache.flume.Sink logger org.apache.flume.sink.LoggerSink org.apache.flume.Sink avro org.apache.flume.sink.AvroSink org.apache.flume.Sink hdfs org.apache.flume.sink.hdfs.HDFSEventSink org.apache.flume.Sink hbase org.apache.flume.sink.hbase.HBaseSink org.apache.flume.Sink asynchbase org.apache.flume.sink.hbase.AsyncHBaseSink org.apache.flume.Sink elasticsearch org.apache.flume.sink.elasticsearch.ElasticSearchSink org.apache.flume.Sink file_roll org.apache.flume.sink.RollingFileSink org.apache.flume.Sink irc org.apache.flume.sink.irc.IRCSink org.apache.flume.Sink thrift org.apache.flume.sink.ThriftSink org.apache.flume.Sink custom-class org.example.MySink org.apache.flume.interceptor.Interceptor timestamp org.apache.flume.interceptor.TimestampInterceptor$Builder org.apache.flume.interceptor.Interceptor host org.apache.flume.interceptor.HostInterceptor$Builder org.apache.flume.interceptor.Interceptor static org.apache.flume.interceptor.StaticInterceptor$Builder org.apache.flume.interceptor.Interceptor regex_filter org.apache.flume.interceptor.RegexFilteringInterceptor$Builder org.apache.flume.interceptor.Interceptor regex_extractor org.apache.flume.interceptor.RegexFilteringInterceptor$Builder org.apache.flume.ChannelSelector replicating org.apache.flume.channel.ReplicatingChannelSelector org.apache.flume.ChannelSelector multiplexing org.apache.flume.channel.MultiplexingChannelSelector org.apache.flume.ChannelSelector custom-class org.example.MyChannelSelector org.apache.flume.SinkProcessor default org.apache.flume.sink.DefaultSinkProcessor org.apache.flume.SinkProcessor failover org.apache.flume.sink.FailoverSinkProcessor org.apache.flume.SinkProcessor load_balance org.apache.flume.sink.LoadBalancingSinkProcessor org.apache.flume.SinkProcessor custom-class

Sources

avro source

a1.sources = r1a1.channels = c1a1.sources.r1.type = avroa1.sources.r1.channels = c1a1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 4141

Thrift source

a1.sources = r1a1.channels = c1a1.sources.r1.type = thrifta1.sources.r1.channels = c1a1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 4141

Exec Source

a1.sources = r1a1.channels = c1a1.sources.r1.type = execa1.sources.r1.command = tail -F /var/log/securea1.sources.r1.channels = c1

JMS Source

a1.sources = r1a1.channels = c1a1.sources.r1.type = jmsa1.sources.r1.channels = c1a1.sources.r1.initialContextFactory = org.apache.activemq.jndi.ActiveMQInitialContextFactorya1.sources.r1.connectionFactory = GenericConnectionFactorya1.sources.r1.providerURL = tcp://mqserver:61616a1.sources.r1.destinationName = BUSINESS_DATAa1.sources.r1.destinationType = QUEUE

Spooling Directory Source

a1.channels = ch-1a1.sources = src-1a1.sources.src-1.type = spooldira1.sources.src-1.channels = ch-1a1.sources.src-1.spoolDir = /var/log/apache/flumeSpoola1.sources.src-1.fileHeader = true

Kafka Source

tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSourcetier1.sources.source1.channels = channel1tier1.sources.source1.batchSize = 5000tier1.sources.source1.batchDurationMillis = 2000tier1.sources.source1.kafka.bootstrap.servers = localhost:9092tier1.sources.source1.kafka.topics = test1, test2tier1.sources.source1.kafka.consumer.group.id = custom.g.idortier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSourcetier1.sources.source1.channels = channel1tier1.sources.source1.kafka.bootstrap.servers = localhost:9092tier1.sources.source1.kafka.topics.regex = ^topic[0-9]$# the default kafka.consumer.group.id=flume is used

NetCat UDP Source

a1.sources = r1a1.channels = c1a1.sources.r1.type = netcatudpa1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 6666a1.sources.r1.channels = c1

Sequence Generator Source

a1.sources = r1a1.channels = c1a1.sources.r1.type = seqa1.sources.r1.channels = c1

Syslog TCP Source

a1.sources = r1a1.channels = c1a1.sources.r1.type = syslogtcpa1.sources.r1.port = 5140a1.sources.r1.host = localhosta1.sources.r1.channels = c1

Custom Source

a1.sources = r1a1.channels = c1a1.sources.r1.type = org.example.MySourcea1.sources.r1.channels = c1

Sinks

HDFS Sink

a1.channels = c1a1.sinks = k1a1.sinks.k1.type = hdfsa1.sinks.k1.channel = c1a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%Sa1.sinks.k1.hdfs.filePrefix = events-a1.sinks.k1.hdfs.round = truea1.sinks.k1.hdfs.roundValue = 10a1.sinks.k1.hdfs.roundUnit = minute# an event with timestamp 11:54:34 AM, June 12, 2012 will cause the hdfs path to become /flume/events/2012-06-12/1150/00.

Hive Sink

a1.channels = c1a1.channels.c1.type = memorya1.sinks = k1a1.sinks.k1.type = hivea1.sinks.k1.channel = c1a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083a1.sinks.k1.hive.database = logsdba1.sinks.k1.hive.table = weblogsa1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%Ma1.sinks.k1.useLocalTimeStamp = falsea1.sinks.k1.round = truea1.sinks.k1.roundValue = 10a1.sinks.k1.roundUnit = minutea1.sinks.k1.serializer = DELIMITEDa1.sinks.k1.serializer.delimiter = "\t"a1.sinks.k1.serializer.serdeSeparator = '\t'a1.sinks.k1.serializer.fieldnames =id,,msg

Logger Sink

a1.channels = c1a1.sinks = k1a1.sinks.k1.type = loggera1.sinks.k1.channel = c1

Avro Sink

a1.channels = c1a1.sinks = k1a1.sinks.k1.type = avroa1.sinks.k1.channel = c1a1.sinks.k1.hostname = 10.10.10.10a1.sinks.k1.port = 4545

Thrift Sink

a1.channels = c1a1.sinks = k1a1.sinks.k1.type = thrifta1.sinks.k1.channel = c1a1.sinks.k1.hostname = 10.10.10.10a1.sinks.k1.port = 4545

HBaseSink

a1.channels = c1a1.sinks = k1a1.sinks.k1.type = hbasea1.sinks.k1.table = foo_tablea1.sinks.k1.columnFamily = bar_cfa1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializera1.sinks.k1.channel = c1

KafkaSink

a1.sinks.k1.channel = c1a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSinka1.sinks.k1.kafka.topic = mytopica1.sinks.k1.kafka.bootstrap.servers = localhost:9092a1.sinks.k1.kafka.flumeBatchSize = 20a1.sinks.k1.kafka.producer.acks = 1a1.sinks.k1.kafka.producer.linger.ms = 1a1.sinks.k1.kafka.producer.compression.type = snappy

HTTP Sink

a1.channels = c1a1.sinks = k1a1.sinks.k1.type = httpa1.sinks.k1.channel = c1a1.sinks.k1.endpoint = http://localhost:8080/someuria1.sinks.k1.connectTimeout = 2000a1.sinks.k1.requestTimeout = 2000a1.sinks.k1.acceptHeader = application/jsona1.sinks.k1.contentTypeHeader = application/jsona1.sinks.k1.defaultBackoff = truea1.sinks.k1.defaultRollback = truea1.sinks.k1.defaultIncrementMetrics = falsea1.sinks.k1.backoff.4XX = falsea1.sinks.k1.rollback.4XX = falsea1.sinks.k1.incrementMetrics.4XX = truea1.sinks.k1.backoff.200 = falsea1.sinks.k1.rollback.200 = falsea1.sinks.k1.incrementMetrics.200 = true

Custom Sink

a1.channels = c1a1.sinks = k1a1.sinks.k1.type = org.example.MySinka1.sinks.k1.channel = c1

Flume Channels

Channels are the repositories where the events are staged on a agent. Source adds the events and Sink removes it.

Memory Channel

a1.channels = c1a1.channels.c1.type = memorya1.channels.c1.capacity = 10000a1.channels.c1.transactionCapacity = 10000a1.channels.c1.byteCapacityBufferPercentage = 20a1.channels.c1.byteCapacity = 800000

JDBC Channel

a1.channels = c1a1.channels.c1.type = jdbc

Kafka Channel

a1.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannela1.channels.channel1.kafka.bootstrap.servers = kafka-1:9092,kafka-2:9092,kafka-3:9092a1.channels.channel1.kafka.topic = channel1a1.channels.channel1.kafka.consumer.group.id = flume-consumer

File Channel

a1.channels = c1a1.channels.c1.type = filea1.channels.c1.checkpointDir = /mnt/flume/checkpointa1.channels.c1.dataDirs = /mnt/flume/data

Flume Channel Selectors

If the type is not specified, then defaults to “replicating”.

Replicating Channel Selector (default)

a1.sources = r1a1.channels = c1 c2 c3a1.sources.r1.selector.type = replicatinga1.sources.r1.channels = c1 c2 c3a1.sources.r1.selector.optional = c3tips:In the above configuration, c3 is an optional channel. Failure to write to c3 is simply ignored. Since c1 and c2 are not marked optional, failure to write to those channels will cause the transaction to fail.

Multiplexing Channel Selector

a1.sources = r1a1.channels = c1 c2 c3 c4a1.sources.r1.selector.type = multiplexinga1.sources.r1.selector.header = statea1.sources.r1.selector.mapping.CZ = c1a1.sources.r1.selector.mapping.US = c2 c3a1.sources.r1.selector.default = c4

Custom Channel Selector

A custom channel selector is your own implementation of the ChannelSelector interface. A custom channel selector’s class and its dependencies must be included in the agent’s classpath when starting the Flume agent. The type of the custom channel selector is its FQCN.

a1.sources = r1a1.channels = c1a1.sources.r1.selector.type = org.example.MyChannelSelector

Flume Sink Processors

Sink groups allow users to group multiple sinks into one entity. Sink processors can be used to provide load balancing capabilities over all sinks inside the group or to achieve fail over from one sink to another in case of temporal failure.

a1.sinkgroups = g1a1.sinkgroups.g1.sinks = k1 k2a1.sinkgroups.g1.processor.type = load_balance

Default Sink Processor

Default sink processor accepts only a single sink. User is not forced to create processor (sink group) for single sinks. Instead user can follow the source - channel - sink pattern that was explained above in this user guide.

Failover Sink Processor

a1.sinkgroups = g1a1.sinkgroups.g1.sinks = k1 k2a1.sinkgroups.g1.processor.type = failovera1.sinkgroups.g1.processor.priority.k1 = 5a1.sinkgroups.g1.processor.priority.k2 = 10a1.sinkgroups.g1.processor.maxpenalty = 10000

Load balancing Sink Processor

a1.sinkgroups = g1a1.sinkgroups.g1.sinks = k1 k2a1.sinkgroups.g1.processor.type = load_balancea1.sinkgroups.g1.processor.backoff = truea1.sinkgroups.g1.processor.selector = random

Custom Sink Processor

Custom sink processors are not supported at the moment.

Event Serializers

The file_roll sink and the hdfs sink both support the EventSerializer interface. Details of the EventSerializers that ship with Flume are provided below.

Body Text Serializer

a1.sinks = k1a1.sinks.k1.type = file_rolla1.sinks.k1.channel = c1a1.sinks.k1.sink.directory = /var/log/flumea1.sinks.k1.sink.serializer = texta1.sinks.k1.sink.serializer.appendNewline = false

“Flume Event” Avro Event Serializer

a1.sinks.k1.type = hdfsa1.sinks.k1.channel = c1a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%Sa1.sinks.k1.serializer = avro_eventa1.sinks.k1.serializer.compressionCodec = snappy

Avro Event Serializer

a1.sinks.k1.type = hdfsa1.sinks.k1.channel = c1a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%Sa1.sinks.k1.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Buildera1.sinks.k1.serializer.compressionCodec = snappya1.sinks.k1.serializer.schemaURL = hdfs://namenode/path/to/schema.avsc

Flume Interceptors

a1.sources = r1a1.sinks = k1a1.channels = c1a1.sources.r1.interceptors = i1 i2a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.HostInterceptor$Buildera1.sources.r1.interceptors.i1.preserveExisting = falsea1.sources.r1.interceptors.i1.hostHeader = hostnamea1.sources.r1.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Buildera1.sinks.k1.filePrefix = FlumeData.%{CollectorHost}.%Y-%m-%da1.sinks.k1.channel = c1

Timestamp Interceptor

This interceptor inserts into the event headers, the time in millis at which it processes the event. This interceptor inserts a header with key timestamp (or as specified by the header property) whose value is the relevant timestamp. This interceptor can preserve an existing timestamp if it is already present in the configuration.

a1.sources = r1a1.channels = c1a1.sources.r1.channels =  c1a1.sources.r1.type = seqa1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = timestamp

Host Interceptor

This interceptor inserts the hostname or IP address of the host that this agent is running on. It inserts a header with key host or a configured key whose value is the hostname or IP address of the host, based on configuration.

a1.sources = r1a1.channels = c1a1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = host

Static Interceptor

a1.sources = r1a1.channels = c1a1.sources.r1.channels =  c1a1.sources.r1.type = seqa1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = statica1.sources.r1.interceptors.i1.key = datacentera1.sources.r1.interceptors.i1.value = NEW_YORK

Remove Header Interceptor

This interceptor manipulates Flume event headers, by removing one or many headers. It can remove a statically defined header, headers based on a regular expression or headers in a list. If none of these is defined, or if no header matches the criteria, the Flume events are not modified.

UUID Interceptor

This interceptor sets a universally unique identifier on all events that are intercepted. An example UUID is b5755073-77a9-43c1-8fad-b7a586fc1b97, which represents a 128-bit value.

Regex Filtering Interceptor

This interceptor filters events selectively by interpreting the event body as text and matching the text against a configured regular expression. The supplied regular expression can be used to include events or exclude events.

Regex Extractor Interceptor

If the Flume event body contained 1:2:3.4foobar5 and the following configuration was used:

a1.sources.r1.interceptors.i1.regex = (\\d):(\\d):(\\d)a1.sources.r1.interceptors.i1.serializers = s1 s2 s3a1.sources.r1.interceptors.i1.serializers.s1.name = onea1.sources.r1.interceptors.i1.serializers.s2.name = twoa1.sources.r1.interceptors.i1.serializers.s3.name = three

阅读全文

0 0