flume源码分析三:flume组件的启动,关闭和监控
来源:互联网 发布:淘宝卖家购物车加购数 编辑:程序博客网 时间:2024/05/22 09:42
接上篇:Application的start()和handleConfigurationEvent(MaterializedConfiguration conf),handleConfigurationEvent方法是在启动时或者需要动态读取配置文件而配置文件发生变化时,会通过eventBus调用此方法。
进入stopallComponents方法,该方法是关闭所有的组件:
可以看出,flume关闭组件的顺序为source->sink->channel。
这些方法主要是将组件以及监控等从内存中移除。lifecycleAware.stop()方法执行具体的lifecycleAware的stop,LifecycleAware是一个顶级接口,定义了组件的开始,结束以及当前状态,flume中重要组件如source,sink,channel都实现了这个接口:
通过该接口实现了多态,不同组件执行自己的start,stop方法。组件的停止分析到此处,下面分析另一个方法:startAllComponents方法:
可见,与关闭顺序不同,启动组件先启动channel,等待启动完毕,然后启动sink,最后启动source。三个组件的启动都是调用了supervisor.supervise这个方法:
注意到上面ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(monitorRunnable, 0, 3, TimeUnit.SECONDS);这段代码启动了一个3s执行的定时任务,每三秒区执行monitorRunnable这个线程,查看该线程的run:
可见,该线程主要监控各个组件的执行状态,状态出错则纠正,或重启组件。另外有个线程定期清空缓存里不需要的调度任务:
以上分析的组件启动,关闭,状态监控位于org.apache.flume.lifecycle包的LifecycleSupervisor类。到此,flume组件的关闭,启动,监控分析完毕
,该方法会先关闭所有组件再启动所有组件,因此,flume 所谓的动态加载并不是真正的动态,只能算自动重启吧,代码如下(org.apache.flume.node.Application):
@Subscribepublic synchronized void handleConfigurationEvent(MaterializedConfiguration conf) {stopAllComponents();startAllComponents(conf);}
进入stopallComponents方法,该方法是关闭所有的组件:
private void stopAllComponents() {if (this.materializedConfiguration != null) {logger.info("Shutting down configuration: {}", this.materializedConfiguration);for (Entry<String, SourceRunner> entry : this.materializedConfiguration.getSourceRunners().entrySet()) {try {logger.info("Stopping Source " + entry.getKey());supervisor.unsupervise(entry.getValue());} catch (Exception e) {logger.error("Error while stopping {}", entry.getValue(), e);}}for (Entry<String, SinkRunner> entry : this.materializedConfiguration.getSinkRunners().entrySet()) {try {logger.info("Stopping Sink " + entry.getKey());supervisor.unsupervise(entry.getValue());} catch (Exception e) {logger.error("Error while stopping {}", entry.getValue(), e);}}for (Entry<String, Channel> entry : this.materializedConfiguration.getChannels().entrySet()) {try {logger.info("Stopping Channel " + entry.getKey());supervisor.unsupervise(entry.getValue());} catch (Exception e) {logger.error("Error while stopping {}", entry.getValue(), e);}}}if (monitorServer != null) {monitorServer.stop();}}
可以看出,flume关闭组件的顺序为source->sink->channel。
另外,这些组件都调用了supervisor.unsupervise(entry.getValue());这个方法来关闭组件,进入unsupervise方法:
public synchronized void unsupervise(LifecycleAware lifecycleAware) {Preconditions.checkState(supervisedProcesses.containsKey(lifecycleAware),"Unaware of " + lifecycleAware + " - can not unsupervise");logger.debug("Unsupervising service:{}", lifecycleAware);synchronized (lifecycleAware) {Supervisoree supervisoree = supervisedProcesses.get(lifecycleAware);supervisoree.status.discard = true;this.setDesiredState(lifecycleAware, LifecycleState.STOP);logger.info("Stopping component: {}", lifecycleAware);lifecycleAware.stop();}supervisedProcesses.remove(lifecycleAware);// We need to do this because a reconfiguration simply unsupervises old// components and supervises new ones.monitorFutures.get(lifecycleAware).cancel(false);// purges are expensive, so it is done only once every 2 hours.needToPurge = true;monitorFutures.remove(lifecycleAware);}
这些方法主要是将组件以及监控等从内存中移除。lifecycleAware.stop()方法执行具体的lifecycleAware的stop,LifecycleAware是一个顶级接口,定义了组件的开始,结束以及当前状态,flume中重要组件如source,sink,channel都实现了这个接口:
public interface LifecycleAware { public void start(); public void stop(); public LifecycleState getLifecycleState();}
通过该接口实现了多态,不同组件执行自己的start,stop方法。组件的停止分析到此处,下面分析另一个方法:startAllComponents方法:
private void startAllComponents(MaterializedConfiguration materializedConfiguration) {logger.info("Starting new configuration:{}", materializedConfiguration);//使用读取的配置文件初始化materializedConfiguration对象this.materializedConfiguration = materializedConfiguration;//先启动channel,等待启动完毕,然后启动sink,最后启动source//从materializedConfiguration中读取channel信息,for (Entry<String, Channel> entry : materializedConfiguration.getChannels().entrySet()) {try {logger.info("Starting Channel " + entry.getKey());supervisor.supervise(entry.getValue(), new SupervisorPolicy.AlwaysRestartPolicy(),LifecycleState.START);} catch (Exception e) {logger.error("Error while starting {}", entry.getValue(), e);}}/* * Wait for all channels to start. */for (Channel ch : materializedConfiguration.getChannels().values()) {while (ch.getLifecycleState() != LifecycleState.START && !supervisor.isComponentInErrorState(ch)) {try {logger.info("Waiting for channel: " + ch.getName() + " to start. Sleeping for 500 ms");Thread.sleep(500);} catch (InterruptedException e) {logger.error("Interrupted while waiting for channel to start.", e);Throwables.propagate(e);}}}for (Entry<String, SinkRunner> entry : materializedConfiguration.getSinkRunners().entrySet()) {try {logger.info("Starting Sink " + entry.getKey());supervisor.supervise(entry.getValue(), new SupervisorPolicy.AlwaysRestartPolicy(),LifecycleState.START);} catch (Exception e) {logger.error("Error while starting {}", entry.getValue(), e);}}for (Entry<String, SourceRunner> entry : materializedConfiguration.getSourceRunners().entrySet()) {try {logger.info("Starting Source " + entry.getKey());supervisor.supervise(entry.getValue(), new SupervisorPolicy.AlwaysRestartPolicy(),LifecycleState.START);} catch (Exception e) {logger.error("Error while starting {}", entry.getValue(), e);}}this.loadMonitoring();}
可见,与关闭顺序不同,启动组件先启动channel,等待启动完毕,然后启动sink,最后启动source。三个组件的启动都是调用了supervisor.supervise这个方法:
//supervise方法用于监控对应的组件public synchronized void supervise(LifecycleAware lifecycleAware, SupervisorPolicy policy,LifecycleState desiredState) {if (this.monitorService.isShutdown() || this.monitorService.isTerminated()|| this.monitorService.isTerminating()) {throw new FlumeException("Supervise called on " + lifecycleAware + " "+ "after shutdown has been initiated. " + lifecycleAware + " will not" + " be started");}//判断这个组件是不是已经被监控起来,如果已经监控则不再添加到监控map中Preconditions.checkState(!supervisedProcesses.containsKey(lifecycleAware),"Refusing to supervise " + lifecycleAware + " more than once");if (logger.isDebugEnabled()) {logger.debug("Supervising service:{} policy:{} desiredState:{}",new Object[] { lifecycleAware, policy, desiredState });}//记录状态信息Supervisoree process = new Supervisoree();process.status = new Status();process.policy = policy;process.status.desiredState = desiredState;process.status.error = false;//MonitorRunnable是一个线程,每过一段时间去检查组件的状态,如果组件状态有误,则改正过来//比如本应该start状态,但是组件挂了,则把组件启动起来MonitorRunnable monitorRunnable = new MonitorRunnable();monitorRunnable.lifecycleAware = lifecycleAware;//监控的对象monitorRunnable.supervisoree = process;//监控状态monitorRunnable.monitorService = monitorService;//监控的线程池//放入当前持有的监控map中supervisedProcesses.put(lifecycleAware, process);//将持有监控对象,对象状态的monitorrunnable对象吊起来,并且每隔三秒区监控ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(monitorRunnable, 0, 3, TimeUnit.SECONDS);//存放每个LifecycleAware组件和调度对应关系记录起来monitorFutures.put(lifecycleAware, future);}
注意到上面ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(monitorRunnable, 0, 3, TimeUnit.SECONDS);这段代码启动了一个3s执行的定时任务,每三秒区执行monitorRunnable这个线程,查看该线程的run:
@Overridepublic void run() {logger.debug("checking process:{} supervisoree:{}", lifecycleAware, supervisoree);long now = System.currentTimeMillis();try {if (supervisoree.status.firstSeen == null) {logger.debug("first time seeing {}", lifecycleAware);// 第一次开始运行时,设置firstSeen为当前的时间System.currentTimeMillis()supervisoree.status.firstSeen = now;}supervisoree.status.lastSeen = now;synchronized (lifecycleAware) {//如果是discard或者error,就丢弃 if (supervisoree.status.discard) {// Unsupervise has already been called on this.logger.info("Component has already been stopped {}", lifecycleAware);return;} else if (supervisoree.status.error) {logger.info("Component {} is in error state, and Flume will not" + "attempt to change its state",lifecycleAware);return;}supervisoree.status.lastSeenState = lifecycleAware.getLifecycleState();//如果状态不是理想的状态,比如理想的状态应该是start,但是现在的状态时stop,那么把组件启动//状态只有两种:start和stop//否则什么都不做if (!lifecycleAware.getLifecycleState().equals(supervisoree.status.desiredState)) {logger.debug("Want to transition {} from {} to {} (failures:{})",new Object[] { lifecycleAware, supervisoree.status.lastSeenState,supervisoree.status.desiredState, supervisoree.status.failures });switch (supervisoree.status.desiredState) {//本该start状态,但是当前非start状态,则调用该组件的start方法将其启动case START:try {lifecycleAware.start();} catch (Throwable e) {logger.error("Unable to start " + lifecycleAware + " - Exception follows.", e);if (e instanceof Error) {// This component can never recover, shut it// down.supervisoree.status.desiredState = LifecycleState.STOP;try {lifecycleAware.stop();logger.warn("Component {} stopped, since it could not be"+ "successfully started due to missing dependencies",lifecycleAware);} catch (Throwable e1) {logger.error("Unsuccessful attempt to "+ "shutdown component: {} due to missing dependencies."+ " Please shutdown the agent"+ "or disable this component, or the agent will be"+ "in an undefined state.", e1);supervisoree.status.error = true;if (e1 instanceof Error) {throw (Error) e1;}// Set the state to stop, so that the// conf poller can// proceed.}}supervisoree.status.failures++;}break;case STOP://本该stop状态,但是当前非stop状态,则调用该组件的stop方法将其停止try {lifecycleAware.stop();} catch (Throwable e) {logger.error("Unable to stop " + lifecycleAware + " - Exception follows.", e);if (e instanceof Error) {throw (Error) e;}supervisoree.status.failures++;}break;default:logger.warn("I refuse to acknowledge {} as a desired state",supervisoree.status.desiredState);}if (!supervisoree.policy.isValid(lifecycleAware, supervisoree.status)) {logger.error("Policy {} of {} has been violated - supervisor should exit!",supervisoree.policy, lifecycleAware);}}}} catch (Throwable t) {logger.error("Unexpected error", t);}logger.debug("Status check complete");}}
可见,该线程主要监控各个组件的执行状态,状态出错则纠正,或重启组件。另外有个线程定期清空缓存里不需要的调度任务:
private class Purger implements Runnable {@Overridepublic void run() {if (needToPurge) {//从工作队列中删除已经cancel的java.util.concurrent.Future对象(释放队列空间)//ScheduledFuture的cancel执行后,ScheduledFuture.purge会移除被cancel的任务monitorService.purge();needToPurge = false;}}}
以上分析的组件启动,关闭,状态监控位于org.apache.flume.lifecycle包的LifecycleSupervisor类。到此,flume组件的关闭,启动,监控分析完毕
1 0
- flume源码分析三:flume组件的启动,关闭和监控
- 【Java】【Flume】Flume-NG启动过程源码分析(三)
- 【Java】【Flume】Flume-NG启动过程源码分析(三)
- 【Flume】【源码分析】深入flume-ng的三大组件——source,channel,sink
- 【Flume】【源码分析】深入flume-ng的三大组件——source,channel,sink
- 【Flume】【源码分析】深入flume-ng的三大组件——source,channel,sink
- 【Flume】【源码分析】flume中http监控类型的源码分析,度量信息分析,以及flume的事件总线
- 源码分析Flume启动过程
- 源码分析Flume启动过程
- 【Flume】【源码分析】从入口Application来分析Flume的启动过程
- 【Java】【Flume】Flume-NG启动过程源码分析(一)
- 【Java】【Flume】Flume-NG启动过程源码分析(二)
- 【Java】【Flume】Flume-NG启动过程源码分析(一)
- 【Java】【Flume】Flume-NG启动过程源码分析(一)
- 【Java】【Flume】Flume-NG启动过程源码分析(二)
- flume监控分析
- 【Flume】flume的自定义组件如何才能被flume的httpmetricsServer监控起来呢?
- Flume-NG启动过程源码分析(1)
- 获取drawRect绘图后的截图
- caffe安装,编译(包括CUDA和cuDNN的安装),并训练,测试自己的数据(caffe使用教程)
- 【leetcode】108. Convert Sorted Array to Binary Search Tree
- 网络编程序列2——C#TCP服务端代码实现二
- Spring中解决Aop 事务嵌套回滚问题
- flume源码分析三:flume组件的启动,关闭和监控
- 纪念成为博客专家 - 碎碎念
- E/BaseJsonHttpCallable: Binary XML file line #7: Error inflating class android.widget.ListView
- play框架使用起来(18)
- 关于 supervisor 的 autorestart 为 unexpected
- C++变长参数
- 玩转Linux - 常用的命令和操作
- shell awk 详解
- linux 安装imagick