Flume NG源码分析(一)基于静态properties文件的配置模块

来源:互联网 发布:耶律大石 知乎 编辑:程序博客网 时间:2024/06/03 21:42

日志收集是互联网公司的一个重要服务,Flume NG是Apache的顶级项目,是分布式日志收集服务的一个开源实现,具有良好的扩展性,与其他很多开源组件可以无缝集成。搜了一圈发现介绍Flume NG的文章有不少,但是深入分析Flume NG源代码的却没有。准备写一个系列分析一下Flume NG的源码。先从基础的配置模块说起。


Flume NG支持两种配置模式,一种是基于properties文件的静态配置,并且只加载一次。另一种是基于Guava EventBus发布订阅模式的动态配置,可运行时加载修改的配置。这篇先说说基于properties文件的静态配置。


下面这个是flume-conf.properties的一个常见配置

1. producers是agent的名字,一个agent表示一个Flume-NG的进程

2. producer.sources指定了这个agent监控的几个日志源,可以配置多个source

3. producer.channels, sinks指定了channel和sink,这些概念后面会说

4. producer.sources.sX.XXX指定了日志源获取的方式,对于从本地日志文件收集的方式来说,实际使用的是tail -F的命令来监控日志文件的尾部

producer.sources = s1 s2 s3producer.channels = cproducer.sinks = rproducer.sources.s1.type = execproducer.sources.s1.channels = cproducer.sources.s1.command = tail -F /data/logs/s1.logproducer.sources.s2.type = execproducer.sources.s2.channels = cproducer.sources.s2.command = tail -F /data/logs/s2.logproducer.sources.s3.type = execproducer.sources.s3.channels = cproducer.sources.s3.command = tail -F /data/logs/s3.logproducer.sinks.r.type = org.apache.flume.plugins.KafkaSinkproducer.sinks.r.metadata.broker.list=server1:9092,server2:9092,server3:9092producer.sinks.r.zk.connect=server1:2181,server2:2181,server3:2181,server4:2181,server5:2181producer.sinks.r.partition.key=0producer.sinks.r.partitioner.class=org.apache.flume.plugins.SinglePartitionproducer.sinks.r.serializer.class=kafka.serializer.StringEncoderproducer.sinks.r.request.required.acks=0producer.sinks.r.max.message.size=1000000producer.sinks.r.producer.type=syncproducer.sinks.r.custom.encoding=UTF-8producer.sinks.r.custom.topic.name=topic.xxx#Specify the channel the sink should useproducer.sinks.r.channel = c# Each channel's type is defined.producer.channels.c.type = memoryproducer.channels.c.capacity = 1000

再看看如何指定的producer这个agent名字以及指定采用哪个配置文件,下面是Flume NG的启动命令,-f指定了配置文件的路径,-n指定了agent的名字,也就是flume-conf.properties里面每项配置的前缀名

/flume-ng  agent -c conf -f ../conf/flume-conf.properties -n producer -Dflume.root.logger=INFO,console > flume-ng.log 2>&1 &

来看看Flume-NG是如何来获取命令行参数,以及如何把flume-conf.properties的配置转化成它内部的数据结构的。

org.apache.flume.node.Application类是Flume NG的启动类,看一下它的main方法

1. 使用了commons-cli.jar提供的解析命令行参数的能力来解析命令行参数,把-n, -f/--conf-file, --no-reload-conf这几个配置信息读到变量

2. 打开由-f参数指定的配置文件,如果指定了no-reload-conf = false,也就是要运行时加载配置,就创建一个EventBus来发布和注册配置文件修改的事件,创建一个

PollingPropertiesFileConfigurationProvider 来轮询properties配置文件是否修改,如果修改就重新加载

3. no-reload-conf默认是true,也就是说默认是静态配置,只在启动时加载一次,只需要创建一个PropertiesFileConfigurationProvider来读取properties配置文件即可

 public static void main(String[] args) {    try {      Options options = new Options();      Option option = new Option("n", "name", true, "the name of this agent");      option.setRequired(true);      options.addOption(option);      option = new Option("f", "conf-file", true, "specify a conf file");      option.setRequired(true);      options.addOption(option);      option = new Option(null, "no-reload-conf", false, "do not reload " +        "conf file if changed");      options.addOption(option);      option = new Option("h", "help", false, "display help text");      options.addOption(option);      CommandLineParser parser = new GnuParser();      CommandLine commandLine = parser.parse(options, args);      File configurationFile = new File(commandLine.getOptionValue('f'));      String agentName = commandLine.getOptionValue('n');      boolean reload = !commandLine.hasOption("no-reload-conf");      if (commandLine.hasOption('h')) {        new HelpFormatter().printHelp("flume-ng agent", options, true);        return;      }      /*       * The following is to ensure that by default the agent       * will fail on startup if the file does not exist.       */      if (!configurationFile.exists()) {        // If command line invocation, then need to fail fast        if (System.getProperty(Constants.SYSPROP_CALLED_FROM_SERVICE) == null) {          String path = configurationFile.getPath();          try {            path = configurationFile.getCanonicalPath();          } catch (IOException ex) {            logger.error("Failed to read canonical path for file: " + path, ex);          }          throw new ParseException(              "The specified configuration file does not exist: " + path);        }      }      List<LifecycleAware> components = Lists.newArrayList();      Application application;      if(reload) {        EventBus eventBus = new EventBus(agentName + "-event-bus");        PollingPropertiesFileConfigurationProvider configurationProvider =            new PollingPropertiesFileConfigurationProvider(agentName,                configurationFile, eventBus, 30);        components.add(configurationProvider);        application = new Application(components);        eventBus.register(application);      } else {        PropertiesFileConfigurationProvider configurationProvider =            new PropertiesFileConfigurationProvider(agentName,                configurationFile);        application = new Application();        application.handleConfigurationEvent(configurationProvider.getConfiguration());      }      application.start();      final Application appReference = application;      Runtime.getRuntime().addShutdownHook(new Thread("agent-shutdown-hook") {        @Override        public void run() {          appReference.stop();        }      });    } catch (Exception e) {      logger.error("A fatal error occurred while running. Exception follows.",          e);    }  }

Flume NG配置相关的接口和类的结构如下

1. ConfigurationProvider顶层接口定义了 MaterializedConfiguration getConfiguration() 方法

2. MaterializedConfiguration接口表示具体化的配置,也就是把flume-conf.properties配置文件里定义的配置实例化成具体的对象。SimpleMaterializedConfiguration提供了实现,维护了实际运行时的配置数据结构

3. AbstractConfigurationProvider实现了ConfigurationProvider接口,并定义了abstract FlumeConfiguration getFlumeConfiguration()抽象方法

4. FlumeConfiguration, AgentConfiguration, SourceConfiguration, ChannelConfiguration, SinkConfiguration这几个类用来辅助解析flume-conf.properties配置文件,保存配置定义的字段

5. PropertiesFileConfigurationProvider从-f/--conf指定的配置文件中读取配置信息,只在读取一次

6. PollingPropertiesFileConfigurationProvider 采用轮询的方式从配置文件中读取配置信息,并支持动态修改配置



PropertiesFileConfigurationProvider的实现很简单

1. 首先是getFlumeConfiguration方法读取properties文件,然后转化成FlumeConfiguration结构的对象

2. 在 父类AbstractConfigurationProvider的getConfiguration方法生成MaterializedConfiguration实例,也就是创建实际运行时的Channel, SourceRunner, SinkRunner对象,它会从FlumeConfiguration中去读取各个对象的字段

public FlumeConfiguration getFlumeConfiguration() {    BufferedReader reader = null;    try {      reader = new BufferedReader(new FileReader(file));      Properties properties = new Properties();      properties.load(reader);      return new FlumeConfiguration(toMap(properties));    } catch (IOException ex) {      LOGGER.error("Unable to load file:" + file          + " (I/O failure) - Exception follows.", ex);    } finally {      if (reader != null) {        try {          reader.close();        } catch (IOException ex) {          LOGGER.warn(              "Unable to close file reader for file: " + file, ex);        }      }    }    return new FlumeConfiguration(new HashMap<String, String>());  }public MaterializedConfiguration getConfiguration() {    MaterializedConfiguration conf = new SimpleMaterializedConfiguration();    FlumeConfiguration fconfig = getFlumeConfiguration();    AgentConfiguration agentConf = fconfig.getConfigurationFor(getAgentName());    if (agentConf != null) {      Map<String, ChannelComponent> channelComponentMap = Maps.newHashMap();      Map<String, SourceRunner> sourceRunnerMap = Maps.newHashMap();      Map<String, SinkRunner> sinkRunnerMap = Maps.newHashMap();      try {        loadChannels(agentConf, channelComponentMap);        loadSources(agentConf, channelComponentMap, sourceRunnerMap);        loadSinks(agentConf, channelComponentMap, sinkRunnerMap);        Set<String> channelNames =            new HashSet<String>(channelComponentMap.keySet());        for(String channelName : channelNames) {          ChannelComponent channelComponent = channelComponentMap.              get(channelName);          if(channelComponent.components.isEmpty()) {            LOGGER.warn(String.format("Channel %s has no components connected" +                " and has been removed.", channelName));            channelComponentMap.remove(channelName);            Map<String, Channel> nameChannelMap = channelCache.                get(channelComponent.channel.getClass());            if(nameChannelMap != null) {              nameChannelMap.remove(channelName);            }          } else {            LOGGER.info(String.format("Channel %s connected to %s",                channelName, channelComponent.components.toString()));            conf.addChannel(channelName, channelComponent.channel);          }        }        for(Map.Entry<String, SourceRunner> entry : sourceRunnerMap.entrySet()) {          conf.addSourceRunner(entry.getKey(), entry.getValue());        }        for(Map.Entry<String, SinkRunner> entry : sinkRunnerMap.entrySet()) {          conf.addSinkRunner(entry.getKey(), entry.getValue());        }      } catch (InstantiationException ex) {        LOGGER.error("Failed to instantiate component", ex);      } finally {        channelComponentMap.clear();        sourceRunnerMap.clear();        sinkRunnerMap.clear();      }    } else {      LOGGER.warn("No configuration found for this host:{}", getAgentName());    }    return conf;  }
</pre><pre name="code" class="java">

0 0
原创粉丝点击