Flume 1.5.2 日志中出现agent-shutdown-hook ,然后结束进程

来源:互联网 发布:渠道网络 编辑:程序博客网 时间:2024/06/07 03:53

    最近在使用Flume收集日志,发现每隔几个小时就flume进程就会挂掉,但是日志中又没有Error 和Exception,进程结束前的一些日志如下:

09 Apr 2015 14:47:21,160 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 1109 Apr 2015 14:47:21,162 INFO  [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)  - Configuration provider stopping09 Apr 2015 14:47:21,177 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.channel.file.Log.roll:932)  - Roll start /maichuang/flume/channel/data09 Apr 2015 14:47:21,177 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.channel.file.LogFile$Writer.<init>:214)  - Opened /maichuang/flume/channel/data/log-1509 Apr 2015 14:47:21,178 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.channel.file.Log.roll:948)  - Roll end09 Apr 2015 14:47:21,210 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.channel.file.Log.roll:932)  - Roll start /maichuang/flume/channel/data09 Apr 2015 14:47:21,211 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.channel.file.LogFile$Writer.<init>:214)  - Opened /maichuang/flume/channel/data/log-1609 Apr 2015 14:47:21,211 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.channel.file.Log.roll:948)  - Roll end09 Apr 2015 14:47:21,211 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.org.apache.flume.ChannelException: Commit failed due to IO error [channel=channel1]        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:605)        at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:466)        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)        at java.lang.Thread.run(Thread.java:745)Caused by: java.nio.channels.ClosedByInterruptException        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)        at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380)        at org.apache.flume.channel.file.LogFileV3.writeDelimitedTo(LogFileV3.java:148)        at org.apache.flume.channel.file.LogFileV3$Writer.<init>(LogFileV3.java:209)        at org.apache.flume.channel.file.LogFileFactory.getWriter(LogFileFactory.java:77)        at org.apache.flume.channel.file.Log.roll(Log.java:935)        at org.apache.flume.channel.file.Log.roll(Log.java:904)        at org.apache.flume.channel.file.Log.rollback(Log.java:719)        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:603)        ... 5 more09 Apr 2015 14:47:21,212 INFO  [agent-shutdown-hook] (org.apache.flume.sink.hdfs.HDFSEventSink.stop:499)  - Closing hdfs://master:9000/flume/2015_04_09-1446.log09 Apr 2015 14:47:21,213 INFO  [agent-shutdown-hook] (org.apache.flume.sink.hdfs.BucketWriter.close:409)  - Closing hdfs://master:9000/flume/_2015_04_09-1446.log.1428562035496.tmp09 Apr 2015 14:47:21,214 INFO  [hdfs-master-call-runner-9] (org.apache.flume.sink.hdfs.BucketWriter$3.call:339)  - Close tries incremented
    虽然我不太懂java,但是我还是硬着头皮去源码里面查看
<pre name="code" class="java">org.apache.flume.lifecycle.LifecycleSupervisor

对应的源码,感觉应该不是这里的问题。后来,感觉应该找  agent-shutdown-hook 相关的信息,果然找到了:

http://www.codatlas.com/github.com/apache/flume/HEAD/flume-ng-node/src/main/java/org/apache/flume/node/Application.java?line=345

public static void main(String[] args) {    try {      boolean isZkConfigured = false;      Options options = new Options();      Option option = new Option("n", "name", true, "the name of this agent");      option.setRequired(true);      options.addOption(option);      option = new Option("f", "conf-file", true,          "specify a config file (required if -z missing)");      option.setRequired(false);      options.addOption(option);      option = new Option(null, "no-reload-conf", false,          "do not reload config file if changed");      options.addOption(option);      // Options for Zookeeper      option = new Option("z", "zkConnString", true,          "specify the ZooKeeper connection to use (required if -f missing)");      option.setRequired(false);      options.addOption(option);      option = new Option("p", "zkBasePath", true,          "specify the base path in ZooKeeper for agent configs");      option.setRequired(false);      options.addOption(option);      option = new Option("h", "help", false, "display help text");      options.addOption(option);      CommandLineParser parser = new GnuParser();      CommandLine commandLine = parser.parse(options, args);      if (commandLine.hasOption('h')) {        new HelpFormatter().printHelp("flume-ng agent", options, true);        return;      }      String agentName = commandLine.getOptionValue('n');      boolean reload = !commandLine.hasOption("no-reload-conf");      if (commandLine.hasOption('z') || commandLine.hasOption("zkConnString")) {        isZkConfigured = true;      }      Application application = null;      if (isZkConfigured) {        // get options        String zkConnectionStr = commandLine.getOptionValue('z');        String baseZkPath = commandLine.getOptionValue('p');        if (reload) {          EventBus eventBus = new EventBus(agentName + "-event-bus");          List<LifecycleAware> components = Lists.newArrayList();          PollingZooKeeperConfigurationProvider zookeeperConfigurationProvider =            new PollingZooKeeperConfigurationProvider(              agentName, zkConnectionStr, baseZkPath, eventBus);          components.add(zookeeperConfigurationProvider);          application = new Application(components);          eventBus.register(application);        } else {          StaticZooKeeperConfigurationProvider zookeeperConfigurationProvider =            new StaticZooKeeperConfigurationProvider(              agentName, zkConnectionStr, baseZkPath);          application = new Application();          application.handleConfigurationEvent(zookeeperConfigurationProvider            .getConfiguration());        }      } else {        File configurationFile = new File(commandLine.getOptionValue('f'));        /*         * The following is to ensure that by default the agent will fail on         * startup if the file does not exist.         */        if (!configurationFile.exists()) {          // If command line invocation, then need to fail fast          if (System.getProperty(Constants.SYSPROP_CALLED_FROM_SERVICE) ==            null) {            String path = configurationFile.getPath();            try {              path = configurationFile.getCanonicalPath();            } catch (IOException ex) {              logger.error("Failed to read canonical path for file: " + path,                ex);            }            throw new ParseException(              "The specified configuration file does not exist: " + path);          }        }        List<LifecycleAware> components = Lists.newArrayList();        if (reload) {          EventBus eventBus = new EventBus(agentName + "-event-bus");          PollingPropertiesFileConfigurationProvider configurationProvider =            new PollingPropertiesFileConfigurationProvider(              agentName, configurationFile, eventBus, 30);          components.add(configurationProvider);          application = new Application(components);          eventBus.register(application);        } else {          PropertiesFileConfigurationProvider configurationProvider =            new PropertiesFileConfigurationProvider(              agentName, configurationFile);          application = new Application();          application.handleConfigurationEvent(configurationProvider            .getConfiguration());        }      }      application.start();      final Application appReference = application;      <strong><span style="color:#ff0000;">Runtime.getRuntime().addShutdownHook(new Thread("agent-shutdown-hook") {        @Override        public void run() {          appReference.stop();        }      });</span></strong>    } catch (Exception e) {      logger.error("A fatal error occurred while running. Exception follows.",          e);    }  }

    上面红色代码部分应该就是结束进程的代码所在。

    于是,我又查找 Runtime.getRuntime().addShutdownHook,在

http://www.codatlas.com/github.com/openjdk-mirror/jdk7u-jdk/master/src/share/classes/java/lang/Runtime.java?keyword=getRuntime&line=209

中找到了:

  public void addShutdownHook(Thread hook) {        SecurityManager sm = System.getSecurityManager();        if (sm != null) {            sm.checkPermission(new RuntimePermission("shutdownHooks"));        }        ApplicationShutdownHooks.add(hook);    }
http://www.codatlas.com/github.com/openjdk-mirror/jdk7u-jdk/HEAD/src/share/classes/java/lang/ApplicationShutdownHooks.java?line=83中也有相关的代码:

static synchronized void add(Thread hook) {        if(hooks == null)            throw new IllegalStateException("Shutdown in progress");        if (hook.isAlive())            throw new IllegalArgumentException("Hook already running");        if (hooks.containsKey(hook))            throw new IllegalArgumentException("Hook previously registered");        hooks.put(hook, hook);    }
另外在Application.java源码中,有这么一段:

public synchronized void stop() {    supervisor.stop();    if(monitorServer != null) {      monitorServer.stop();    }  }

然后我又去看 supervisor.stop()对应的源码,可从Application.java 中

public Application(List<LifecycleAware> components) {    this.components = components;    supervisor = new LifecycleSupervisor();  }

可知:supervisor = new LifecycleSupervisor()

supervisor 是  LifecycleSupervisor的一个实例,故又转而看LifecycleSupervisor所在的源码:

http://www.codatlas.com/github.com/apache/flume/HEAD/flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java?line=51

Application.java代码中的

 application.start();<span style="color:#ff0000;">//启动application之后再执行如下代码??这样不就是一直会执行stop()了么?这中间的逻辑我还没搞明白,期待高手帮忙解答。</span>      final Application appReference = application;      Runtime.getRuntime().addShutdownHook(new Thread("agent-shutdown-hook") {        @Override        public void run() {          appReference.stop();        }      });
如上面的红色字体,目前还没搞懂,望高手留言指点迷津,或发我邮箱daiyongtao0211@qq.com   ,谢谢!

0 0
原创粉丝点击