Flume 配置文件概述

来源:互联网 发布:mysql 查看权限 编辑:程序博客网 时间:2024/05/21 08:53

flume Overview

Alias Conventions

Alias Name Alias Type a agent c channel r source k sink g sink group i interceptor y key h host s serializer

Component Summary

Component Interface Type Alias Implementation Class org.apache.flume.Source avro org.apache.flume.source.AvroSource org.apache.flume.Source netcat org.apache.flume.source.NetcatSource org.apache.flume.Source seq org.apache.flume.source.SequenceGeneratorSource org.apache.flume.Source exec org.apache.flume.source.ExecSource org.apache.flume.Source syslogtcp org.apache.flume.source.SyslogTcpSource org.apache.flume.Source multiport_syslogtcp org.apache.flume.source.MultiportSyslogTCPSource org.apache.flume.Source syslogudp org.apache.flume.source.SyslogUDPSource org.apache.flume.Source spooldir org.apache.flume.source.SpoolDirectorySource org.apache.flume.Source http org.apache.flume.source.http.HTTPSource org.apache.flume.Source thrift org.apache.flume.source.ThriftSource org.apache.flume.Source jms org.apache.flume.source.jms.JMSSource org.apache.flume.Source custom org.example.MySource org.apache.flume.Channel memory org.apache.flume.channel.MemoryChannel org.apache.flume.Channel jdbc org.apache.flume.channel.jdbc.JdbcChannel org.apache.flume.Channel file org.apache.flume.channel.file.FileChannel org.apache.flume.Channel custom-class org.example.MyChannel org.apache.flume.Sink null org.apache.flume.sink.NullSink org.apache.flume.Sink logger org.apache.flume.sink.LoggerSink org.apache.flume.Sink avro org.apache.flume.sink.AvroSink org.apache.flume.Sink hdfs org.apache.flume.sink.hdfs.HDFSEventSink org.apache.flume.Sink hbase org.apache.flume.sink.hbase.HBaseSink org.apache.flume.Sink asynchbase org.apache.flume.sink.hbase.AsyncHBaseSink org.apache.flume.Sink elasticsearch org.apache.flume.sink.elasticsearch.ElasticSearchSink org.apache.flume.Sink file_roll org.apache.flume.sink.RollingFileSink org.apache.flume.Sink irc org.apache.flume.sink.irc.IRCSink org.apache.flume.Sink thrift org.apache.flume.sink.ThriftSink org.apache.flume.Sink custom-class org.example.MySink org.apache.flume.interceptor.Interceptor timestamp org.apache.flume.interceptor.TimestampInterceptor$Builder org.apache.flume.interceptor.Interceptor host org.apache.flume.interceptor.HostInterceptor$Builder org.apache.flume.interceptor.Interceptor static org.apache.flume.interceptor.StaticInterceptor$Builder org.apache.flume.interceptor.Interceptor regex_filter org.apache.flume.interceptor.RegexFilteringInterceptor$Builder org.apache.flume.interceptor.Interceptor regex_extractor org.apache.flume.interceptor.RegexFilteringInterceptor$Builder org.apache.flume.ChannelSelector replicating org.apache.flume.channel.ReplicatingChannelSelector org.apache.flume.ChannelSelector multiplexing org.apache.flume.channel.MultiplexingChannelSelector org.apache.flume.ChannelSelector custom-class org.example.MyChannelSelector org.apache.flume.SinkProcessor default org.apache.flume.sink.DefaultSinkProcessor org.apache.flume.SinkProcessor failover org.apache.flume.sink.FailoverSinkProcessor org.apache.flume.SinkProcessor load_balance org.apache.flume.sink.LoadBalancingSinkProcessor org.apache.flume.SinkProcessor custom-class

Sources

  • avro source
a1.sources = r1a1.channels = c1a1.sources.r1.type = avroa1.sources.r1.channels = c1a1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 4141
  • Thrift source
a1.sources = r1a1.channels = c1a1.sources.r1.type = thrifta1.sources.r1.channels = c1a1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 4141
  • Exec Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = execa1.sources.r1.command = tail -F /var/log/securea1.sources.r1.channels = c1
  • JMS Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = jmsa1.sources.r1.channels = c1a1.sources.r1.initialContextFactory = org.apache.activemq.jndi.ActiveMQInitialContextFactorya1.sources.r1.connectionFactory = GenericConnectionFactorya1.sources.r1.providerURL = tcp://mqserver:61616a1.sources.r1.destinationName = BUSINESS_DATAa1.sources.r1.destinationType = QUEUE
  • Spooling Directory Source
a1.channels = ch-1a1.sources = src-1a1.sources.src-1.type = spooldira1.sources.src-1.channels = ch-1a1.sources.src-1.spoolDir = /var/log/apache/flumeSpoola1.sources.src-1.fileHeader = true
  • Kafka Source
tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSourcetier1.sources.source1.channels = channel1tier1.sources.source1.batchSize = 5000tier1.sources.source1.batchDurationMillis = 2000tier1.sources.source1.kafka.bootstrap.servers = localhost:9092tier1.sources.source1.kafka.topics = test1, test2tier1.sources.source1.kafka.consumer.group.id = custom.g.idortier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSourcetier1.sources.source1.channels = channel1tier1.sources.source1.kafka.bootstrap.servers = localhost:9092tier1.sources.source1.kafka.topics.regex = ^topic[0-9]$# the default kafka.consumer.group.id=flume is used
  • NetCat UDP Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = netcatudpa1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 6666a1.sources.r1.channels = c1
  • Sequence Generator Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = seqa1.sources.r1.channels = c1
  • Syslog TCP Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = syslogtcpa1.sources.r1.port = 5140a1.sources.r1.host = localhosta1.sources.r1.channels = c1
  • Custom Source
a1.sources = r1a1.channels = c1a1.sources.r1.type = org.example.MySourcea1.sources.r1.channels = c1

Sinks

  • HDFS Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = hdfsa1.sinks.k1.channel = c1a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%Sa1.sinks.k1.hdfs.filePrefix = events-a1.sinks.k1.hdfs.round = truea1.sinks.k1.hdfs.roundValue = 10a1.sinks.k1.hdfs.roundUnit = minute# an event with timestamp 11:54:34 AM, June 12, 2012 will cause the hdfs path to become /flume/events/2012-06-12/1150/00.
  • Hive Sink
a1.channels = c1a1.channels.c1.type = memorya1.sinks = k1a1.sinks.k1.type = hivea1.sinks.k1.channel = c1a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083a1.sinks.k1.hive.database = logsdba1.sinks.k1.hive.table = weblogsa1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%Ma1.sinks.k1.useLocalTimeStamp = falsea1.sinks.k1.round = truea1.sinks.k1.roundValue = 10a1.sinks.k1.roundUnit = minutea1.sinks.k1.serializer = DELIMITEDa1.sinks.k1.serializer.delimiter = "\t"a1.sinks.k1.serializer.serdeSeparator = '\t'a1.sinks.k1.serializer.fieldnames =id,,msg
  • Logger Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = loggera1.sinks.k1.channel = c1
  • Avro Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = avroa1.sinks.k1.channel = c1a1.sinks.k1.hostname = 10.10.10.10a1.sinks.k1.port = 4545
  • Thrift Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = thrifta1.sinks.k1.channel = c1a1.sinks.k1.hostname = 10.10.10.10a1.sinks.k1.port = 4545
  • HBaseSink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = hbasea1.sinks.k1.table = foo_tablea1.sinks.k1.columnFamily = bar_cfa1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializera1.sinks.k1.channel = c1
  • KafkaSink
a1.sinks.k1.channel = c1a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSinka1.sinks.k1.kafka.topic = mytopica1.sinks.k1.kafka.bootstrap.servers = localhost:9092a1.sinks.k1.kafka.flumeBatchSize = 20a1.sinks.k1.kafka.producer.acks = 1a1.sinks.k1.kafka.producer.linger.ms = 1a1.sinks.k1.kafka.producer.compression.type = snappy
  • HTTP Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = httpa1.sinks.k1.channel = c1a1.sinks.k1.endpoint = http://localhost:8080/someuria1.sinks.k1.connectTimeout = 2000a1.sinks.k1.requestTimeout = 2000a1.sinks.k1.acceptHeader = application/jsona1.sinks.k1.contentTypeHeader = application/jsona1.sinks.k1.defaultBackoff = truea1.sinks.k1.defaultRollback = truea1.sinks.k1.defaultIncrementMetrics = falsea1.sinks.k1.backoff.4XX = falsea1.sinks.k1.rollback.4XX = falsea1.sinks.k1.incrementMetrics.4XX = truea1.sinks.k1.backoff.200 = falsea1.sinks.k1.rollback.200 = falsea1.sinks.k1.incrementMetrics.200 = true
  • Custom Sink
a1.channels = c1a1.sinks = k1a1.sinks.k1.type = org.example.MySinka1.sinks.k1.channel = c1

Flume Channels

Channels are the repositories where the events are staged on a agent. Source adds the events and Sink removes it.

  • Memory Channel
a1.channels = c1a1.channels.c1.type = memorya1.channels.c1.capacity = 10000a1.channels.c1.transactionCapacity = 10000a1.channels.c1.byteCapacityBufferPercentage = 20a1.channels.c1.byteCapacity = 800000
  • JDBC Channel
a1.channels = c1a1.channels.c1.type = jdbc
  • Kafka Channel
a1.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannela1.channels.channel1.kafka.bootstrap.servers = kafka-1:9092,kafka-2:9092,kafka-3:9092a1.channels.channel1.kafka.topic = channel1a1.channels.channel1.kafka.consumer.group.id = flume-consumer
  • File Channel
a1.channels = c1a1.channels.c1.type = filea1.channels.c1.checkpointDir = /mnt/flume/checkpointa1.channels.c1.dataDirs = /mnt/flume/data

Flume Channel Selectors

If the type is not specified, then defaults to “replicating”.

  • Replicating Channel Selector (default)
a1.sources = r1a1.channels = c1 c2 c3a1.sources.r1.selector.type = replicatinga1.sources.r1.channels = c1 c2 c3a1.sources.r1.selector.optional = c3tips:In the above configuration, c3 is an optional channel. Failure to write to c3 is simply ignored. Since c1 and c2 are not marked optional, failure to write to those channels will cause the transaction to fail.
  • Multiplexing Channel Selector
a1.sources = r1a1.channels = c1 c2 c3 c4a1.sources.r1.selector.type = multiplexinga1.sources.r1.selector.header = statea1.sources.r1.selector.mapping.CZ = c1a1.sources.r1.selector.mapping.US = c2 c3a1.sources.r1.selector.default = c4
  • Custom Channel Selector

A custom channel selector is your own implementation of the ChannelSelector interface. A custom channel selector’s class and its dependencies must be included in the agent’s classpath when starting the Flume agent. The type of the custom channel selector is its FQCN.

a1.sources = r1a1.channels = c1a1.sources.r1.selector.type = org.example.MyChannelSelector

Flume Sink Processors

Sink groups allow users to group multiple sinks into one entity. Sink processors can be used to provide load balancing capabilities over all sinks inside the group or to achieve fail over from one sink to another in case of temporal failure.

a1.sinkgroups = g1a1.sinkgroups.g1.sinks = k1 k2a1.sinkgroups.g1.processor.type = load_balance
  • Default Sink Processor

Default sink processor accepts only a single sink. User is not forced to create processor (sink group) for single sinks. Instead user can follow the source - channel - sink pattern that was explained above in this user guide.

  • Failover Sink Processor
a1.sinkgroups = g1a1.sinkgroups.g1.sinks = k1 k2a1.sinkgroups.g1.processor.type = failovera1.sinkgroups.g1.processor.priority.k1 = 5a1.sinkgroups.g1.processor.priority.k2 = 10a1.sinkgroups.g1.processor.maxpenalty = 10000
  • Load balancing Sink Processor
a1.sinkgroups = g1a1.sinkgroups.g1.sinks = k1 k2a1.sinkgroups.g1.processor.type = load_balancea1.sinkgroups.g1.processor.backoff = truea1.sinkgroups.g1.processor.selector = random
  • Custom Sink Processor

Custom sink processors are not supported at the moment.

Event Serializers

The file_roll sink and the hdfs sink both support the EventSerializer interface. Details of the EventSerializers that ship with Flume are provided below.

  • Body Text Serializer
a1.sinks = k1a1.sinks.k1.type = file_rolla1.sinks.k1.channel = c1a1.sinks.k1.sink.directory = /var/log/flumea1.sinks.k1.sink.serializer = texta1.sinks.k1.sink.serializer.appendNewline = false
  • “Flume Event” Avro Event Serializer
a1.sinks.k1.type = hdfsa1.sinks.k1.channel = c1a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%Sa1.sinks.k1.serializer = avro_eventa1.sinks.k1.serializer.compressionCodec = snappy
  • Avro Event Serializer
a1.sinks.k1.type = hdfsa1.sinks.k1.channel = c1a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%Sa1.sinks.k1.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Buildera1.sinks.k1.serializer.compressionCodec = snappya1.sinks.k1.serializer.schemaURL = hdfs://namenode/path/to/schema.avsc

Flume Interceptors

a1.sources = r1a1.sinks = k1a1.channels = c1a1.sources.r1.interceptors = i1 i2a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.HostInterceptor$Buildera1.sources.r1.interceptors.i1.preserveExisting = falsea1.sources.r1.interceptors.i1.hostHeader = hostnamea1.sources.r1.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Buildera1.sinks.k1.filePrefix = FlumeData.%{CollectorHost}.%Y-%m-%da1.sinks.k1.channel = c1
  • Timestamp Interceptor

This interceptor inserts into the event headers, the time in millis at which it processes the event. This interceptor inserts a header with key timestamp (or as specified by the header property) whose value is the relevant timestamp. This interceptor can preserve an existing timestamp if it is already present in the configuration.

a1.sources = r1a1.channels = c1a1.sources.r1.channels =  c1a1.sources.r1.type = seqa1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = timestamp
  • Host Interceptor

This interceptor inserts the hostname or IP address of the host that this agent is running on. It inserts a header with key host or a configured key whose value is the hostname or IP address of the host, based on configuration.

a1.sources = r1a1.channels = c1a1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = host
  • Static Interceptor
a1.sources = r1a1.channels = c1a1.sources.r1.channels =  c1a1.sources.r1.type = seqa1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type = statica1.sources.r1.interceptors.i1.key = datacentera1.sources.r1.interceptors.i1.value = NEW_YORK
  • Remove Header Interceptor

This interceptor manipulates Flume event headers, by removing one or many headers. It can remove a statically defined header, headers based on a regular expression or headers in a list. If none of these is defined, or if no header matches the criteria, the Flume events are not modified.

  • UUID Interceptor

This interceptor sets a universally unique identifier on all events that are intercepted. An example UUID is b5755073-77a9-43c1-8fad-b7a586fc1b97, which represents a 128-bit value.

  • Regex Filtering Interceptor

This interceptor filters events selectively by interpreting the event body as text and matching the text against a configured regular expression. The supplied regular expression can be used to include events or exclude events.

  • Regex Extractor Interceptor

If the Flume event body contained 1:2:3.4foobar5 and the following configuration was used:

a1.sources.r1.interceptors.i1.regex = (\\d):(\\d):(\\d)a1.sources.r1.interceptors.i1.serializers = s1 s2 s3a1.sources.r1.interceptors.i1.serializers.s1.name = onea1.sources.r1.interceptors.i1.serializers.s2.name = twoa1.sources.r1.interceptors.i1.serializers.s3.name = three