How to setup FlumeNG with example configuration

来源:互联网 发布:理智与情感赏析知乎 编辑:程序博客网 时间:2024/05/18 02:15

1. Check out the source

For those that prefer subversion:

$ svn checkout https://svn.apache.org/repos/asf/incubator/flume/trunk/

If you're more of a git person:

$ git clone git://git.apache.org/flume.git$ cd flume$ git checkout trunk

Note: The git repo is a read-only mirror of the subversion repo.

2. Compile the project

# Build the code and run the tests$ mvn package# ...or build the code without running the tests$ mvn package -DskipTests

This produces two types of packages in flume-ng-dist/target. They are:

  • flume-ng-dist-1.2.0-incubating-SNAPSHOT-dist.tar.gz - A binary distribution of Flume, ready to run.
  • flume-ng-dist-1.2.0-incubating-SNAPSHOT-src.tar.gz - A source-only distribution of Flume.

If you're a user and you just want to run Flume, you probably want the -dist version. Copy one out, decompress it, and you're ready to go.

$ cp flume-ng-dist/target/flume-ng-dist-1.2.0-incubating-SNAPSHOT-dist.tar.gz .$ tar -zxvf flume-ng-dist-1.2.0-incubating-SNAPSHOT-dist.tar.gz$ cd flume-1.2.0-incubating-SNAPSHOT

3. Create your own properties file based on the working template (or create one from scratch)

$ cp conf/flume-conf.properties.template conf/flume.conf

4. (Optional) Create your flume-env.sh file based on the template (or create one from scratch). The flume-ng executable looks for and sources a file named "flume-env.sh" in the conf directory specified by the --conf/-c commandline option. One use case for using flume-env.sh would be to specify debugging or profiling options via JAVA_OPTS when developing your own custom Flume NG components such as sources and sinks.

$ cp conf/flume-env.sh.template conf/flume-env.sh

5. Configure 

FlumeNG configure is format is <identifier>.type.subtype.parameter.config, where <identifier> is the name of the agent we call later to startup.  Take below conf file for example, 

<identifier> --> syslog-agent 

type --> sources, sinks, channels

subtype --> Syslog, HDFS-LAB, MemoryChannel-1

parameter --> type, port, channels, channel(so strange for different), hdfs

config --> path, file.Prefix, 

syslog-agent.sources = Syslogsyslog-agent.channels = MemoryChannel-1syslog-agent.sinks = HDFS-LABsyslog-agent.sources.Syslog.type = syslogTcpsyslog-agent.sources.Syslog.port = 5140syslog-agent.sources.Syslog.channels = MemoryChannel-1syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1syslog-agent.sinks.HDFS-LAB.type = hdfssyslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://NN.URI:PORT/flumetest/'%{host}''syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfilessyslog-agent.sinks.HDFS-LAB.hdfs.file.rollInterval = 60syslog-agent.sinks.HDFS-LAB.hdfs.file.Type = SequenceFilesyslog-agent.channels.MemoryChannel-1.type = memory

sample config run on psedule hadoop:

agent.channels = c1agent.sources = r1agent.sinks = k1#agent.channels.c1.type = MEMORY#agent.sources.r1.channels = c1agent.sources.r1.type = SEQ#agent.sinks.k1.channel = c1agent.sinks.k1.type = LOGGER#agent.sinks.k1.channel = c1agent.sinks.k1.type = HDFSagent.sinks.k1.hdfs.path = hdfs://localhost/tmp/data/flume #hdfs://<host>:<port>:<path>agent.sinks.k1.hdfs.fileType = DataStreamagent.sinks.k1.hdfs.codeC = SnappyCodec


6. Run Flume NG

After you've configured Flume NG (see below), you can run it with the bin/flume-ng executable. This script has a number of arguments and modes.

Issue a flume command to start agent2(Which puts data on HDFS).

              flume-ng node -c ../conf/ -f ../conf/agent2.properties -n agent2

-c define global configuration folder.

-f The configuration filename for the Flume-NG node(Mandatory)

-n The name of the Flume-NG node(Mandatory)

This would start a node with HDFS as a sink,Apache Avro as a source and memory as a channel.


Start the flow
Flume-ng starts a single flow per process. That's will be done with:
bin/flume-ng agent -n YOUR_IDENTIFIER -f YOUR_CONFIGFILE 
eg:
$bin/flume-ng agent -n syslog-agent -f conf/syslog-agent.cnf
$ bin/flume-ng agent --conf conf/ -f conf/flume.conf -n agent1
$ flume-ng agent -n agent -f /usr/lib/flume-ng/conf/flume-conf.properties


For reference:

flume-ng global options

OptionDescription--conf,-c <conf>Use configs in <conf> directory--classpath,-C <cp>Append to the classpath--dryrun,-d 
Do not actually start Flume, just print the command 
-Dproperty=value 
Sets a JDK system property value 

flume-ng agent options

When given the agent command, a Flume NG agent will be started with a given configuration file (required).

OptionDescription--conf-file,-f <file>Indicates which configuration file you want to run with (required)--name,-n <agentname>Indicates the name of agent on which we're running (required)





原创粉丝点击