flume源码分析三:flume组件的启动,关闭和监控

来源:互联网 发布:淘宝卖家购物车加购数 编辑:程序博客网 时间:2024/05/22 09:42
接上篇:Application的start()和handleConfigurationEvent(MaterializedConfiguration conf),handleConfigurationEvent方法是在启动时或者需要动态读取配置文件而配置文件发生变化时,会通过eventBus调用此方法。

,该方法会先关闭所有组件再启动所有组件,因此,flume 所谓的动态加载并不是真正的动态,只能算自动重启吧,代码如下(org.apache.flume.node.Application):

@Subscribepublic synchronized void handleConfigurationEvent(MaterializedConfiguration conf) {stopAllComponents();startAllComponents(conf);}

进入stopallComponents方法,该方法是关闭所有的组件:

private void stopAllComponents() {if (this.materializedConfiguration != null) {logger.info("Shutting down configuration: {}", this.materializedConfiguration);for (Entry<String, SourceRunner> entry : this.materializedConfiguration.getSourceRunners().entrySet()) {try {logger.info("Stopping Source " + entry.getKey());supervisor.unsupervise(entry.getValue());} catch (Exception e) {logger.error("Error while stopping {}", entry.getValue(), e);}}for (Entry<String, SinkRunner> entry : this.materializedConfiguration.getSinkRunners().entrySet()) {try {logger.info("Stopping Sink " + entry.getKey());supervisor.unsupervise(entry.getValue());} catch (Exception e) {logger.error("Error while stopping {}", entry.getValue(), e);}}for (Entry<String, Channel> entry : this.materializedConfiguration.getChannels().entrySet()) {try {logger.info("Stopping Channel " + entry.getKey());supervisor.unsupervise(entry.getValue());} catch (Exception e) {logger.error("Error while stopping {}", entry.getValue(), e);}}}if (monitorServer != null) {monitorServer.stop();}}

可以看出,flume关闭组件的顺序为source->sink->channel。

另外,这些组件都调用了supervisor.unsupervise(entry.getValue());这个方法来关闭组件,进入unsupervise方法:

public synchronized void unsupervise(LifecycleAware lifecycleAware) {Preconditions.checkState(supervisedProcesses.containsKey(lifecycleAware),"Unaware of " + lifecycleAware + " - can not unsupervise");logger.debug("Unsupervising service:{}", lifecycleAware);synchronized (lifecycleAware) {Supervisoree supervisoree = supervisedProcesses.get(lifecycleAware);supervisoree.status.discard = true;this.setDesiredState(lifecycleAware, LifecycleState.STOP);logger.info("Stopping component: {}", lifecycleAware);lifecycleAware.stop();}supervisedProcesses.remove(lifecycleAware);// We need to do this because a reconfiguration simply unsupervises old// components and supervises new ones.monitorFutures.get(lifecycleAware).cancel(false);// purges are expensive, so it is done only once every 2 hours.needToPurge = true;monitorFutures.remove(lifecycleAware);}

这些方法主要是将组件以及监控等从内存中移除。lifecycleAware.stop()方法执行具体的lifecycleAware的stop,LifecycleAware是一个顶级接口,定义了组件的开始,结束以及当前状态,flume中重要组件如source,sink,channel都实现了这个接口:

public interface LifecycleAware {  public void start();  public void stop();  public LifecycleState getLifecycleState();}

通过该接口实现了多态,不同组件执行自己的start,stop方法。组件的停止分析到此处,下面分析另一个方法:startAllComponents方法:

private void startAllComponents(MaterializedConfiguration materializedConfiguration) {logger.info("Starting new configuration:{}", materializedConfiguration);//使用读取的配置文件初始化materializedConfiguration对象this.materializedConfiguration = materializedConfiguration;//先启动channel,等待启动完毕,然后启动sink,最后启动source//从materializedConfiguration中读取channel信息,for (Entry<String, Channel> entry : materializedConfiguration.getChannels().entrySet()) {try {logger.info("Starting Channel " + entry.getKey());supervisor.supervise(entry.getValue(), new SupervisorPolicy.AlwaysRestartPolicy(),LifecycleState.START);} catch (Exception e) {logger.error("Error while starting {}", entry.getValue(), e);}}/* * Wait for all channels to start. */for (Channel ch : materializedConfiguration.getChannels().values()) {while (ch.getLifecycleState() != LifecycleState.START && !supervisor.isComponentInErrorState(ch)) {try {logger.info("Waiting for channel: " + ch.getName() + " to start. Sleeping for 500 ms");Thread.sleep(500);} catch (InterruptedException e) {logger.error("Interrupted while waiting for channel to start.", e);Throwables.propagate(e);}}}for (Entry<String, SinkRunner> entry : materializedConfiguration.getSinkRunners().entrySet()) {try {logger.info("Starting Sink " + entry.getKey());supervisor.supervise(entry.getValue(), new SupervisorPolicy.AlwaysRestartPolicy(),LifecycleState.START);} catch (Exception e) {logger.error("Error while starting {}", entry.getValue(), e);}}for (Entry<String, SourceRunner> entry : materializedConfiguration.getSourceRunners().entrySet()) {try {logger.info("Starting Source " + entry.getKey());supervisor.supervise(entry.getValue(), new SupervisorPolicy.AlwaysRestartPolicy(),LifecycleState.START);} catch (Exception e) {logger.error("Error while starting {}", entry.getValue(), e);}}this.loadMonitoring();}

可见,与关闭顺序不同,启动组件先启动channel,等待启动完毕,然后启动sink,最后启动source。三个组件的启动都是调用了supervisor.supervise这个方法:

//supervise方法用于监控对应的组件public synchronized void supervise(LifecycleAware lifecycleAware, SupervisorPolicy policy,LifecycleState desiredState) {if (this.monitorService.isShutdown() || this.monitorService.isTerminated()|| this.monitorService.isTerminating()) {throw new FlumeException("Supervise called on " + lifecycleAware + " "+ "after shutdown has been initiated. " + lifecycleAware + " will not" + " be started");}//判断这个组件是不是已经被监控起来,如果已经监控则不再添加到监控map中Preconditions.checkState(!supervisedProcesses.containsKey(lifecycleAware),"Refusing to supervise " + lifecycleAware + " more than once");if (logger.isDebugEnabled()) {logger.debug("Supervising service:{} policy:{} desiredState:{}",new Object[] { lifecycleAware, policy, desiredState });}//记录状态信息Supervisoree process = new Supervisoree();process.status = new Status();process.policy = policy;process.status.desiredState = desiredState;process.status.error = false;//MonitorRunnable是一个线程,每过一段时间去检查组件的状态,如果组件状态有误,则改正过来//比如本应该start状态,但是组件挂了,则把组件启动起来MonitorRunnable monitorRunnable = new MonitorRunnable();monitorRunnable.lifecycleAware = lifecycleAware;//监控的对象monitorRunnable.supervisoree = process;//监控状态monitorRunnable.monitorService = monitorService;//监控的线程池//放入当前持有的监控map中supervisedProcesses.put(lifecycleAware, process);//将持有监控对象,对象状态的monitorrunnable对象吊起来,并且每隔三秒区监控ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(monitorRunnable, 0, 3, TimeUnit.SECONDS);//存放每个LifecycleAware组件和调度对应关系记录起来monitorFutures.put(lifecycleAware, future);}

注意到上面ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(monitorRunnable, 0, 3, TimeUnit.SECONDS);这段代码启动了一个3s执行的定时任务,每三秒区执行monitorRunnable这个线程,查看该线程的run:

@Overridepublic void run() {logger.debug("checking process:{} supervisoree:{}", lifecycleAware, supervisoree);long now = System.currentTimeMillis();try {if (supervisoree.status.firstSeen == null) {logger.debug("first time seeing {}", lifecycleAware);// 第一次开始运行时,设置firstSeen为当前的时间System.currentTimeMillis()supervisoree.status.firstSeen = now;}supervisoree.status.lastSeen = now;synchronized (lifecycleAware) {//如果是discard或者error,就丢弃 if (supervisoree.status.discard) {// Unsupervise has already been called on this.logger.info("Component has already been stopped {}", lifecycleAware);return;} else if (supervisoree.status.error) {logger.info("Component {} is in error state, and Flume will not" + "attempt to change its state",lifecycleAware);return;}supervisoree.status.lastSeenState = lifecycleAware.getLifecycleState();//如果状态不是理想的状态,比如理想的状态应该是start,但是现在的状态时stop,那么把组件启动//状态只有两种:start和stop//否则什么都不做if (!lifecycleAware.getLifecycleState().equals(supervisoree.status.desiredState)) {logger.debug("Want to transition {} from {} to {} (failures:{})",new Object[] { lifecycleAware, supervisoree.status.lastSeenState,supervisoree.status.desiredState, supervisoree.status.failures });switch (supervisoree.status.desiredState) {//本该start状态,但是当前非start状态,则调用该组件的start方法将其启动case START:try {lifecycleAware.start();} catch (Throwable e) {logger.error("Unable to start " + lifecycleAware + " - Exception follows.", e);if (e instanceof Error) {// This component can never recover, shut it// down.supervisoree.status.desiredState = LifecycleState.STOP;try {lifecycleAware.stop();logger.warn("Component {} stopped, since it could not be"+ "successfully started due to missing dependencies",lifecycleAware);} catch (Throwable e1) {logger.error("Unsuccessful attempt to "+ "shutdown component: {} due to missing dependencies."+ " Please shutdown the agent"+ "or disable this component, or the agent will be"+ "in an undefined state.", e1);supervisoree.status.error = true;if (e1 instanceof Error) {throw (Error) e1;}// Set the state to stop, so that the// conf poller can// proceed.}}supervisoree.status.failures++;}break;case STOP://本该stop状态,但是当前非stop状态,则调用该组件的stop方法将其停止try {lifecycleAware.stop();} catch (Throwable e) {logger.error("Unable to stop " + lifecycleAware + " - Exception follows.", e);if (e instanceof Error) {throw (Error) e;}supervisoree.status.failures++;}break;default:logger.warn("I refuse to acknowledge {} as a desired state",supervisoree.status.desiredState);}if (!supervisoree.policy.isValid(lifecycleAware, supervisoree.status)) {logger.error("Policy {} of {} has been violated - supervisor should exit!",supervisoree.policy, lifecycleAware);}}}} catch (Throwable t) {logger.error("Unexpected error", t);}logger.debug("Status check complete");}}

可见,该线程主要监控各个组件的执行状态,状态出错则纠正,或重启组件。另外有个线程定期清空缓存里不需要的调度任务:

private class Purger implements Runnable {@Overridepublic void run() {if (needToPurge) {//从工作队列中删除已经cancel的java.util.concurrent.Future对象(释放队列空间)//ScheduledFuture的cancel执行后,ScheduledFuture.purge会移除被cancel的任务monitorService.purge();needToPurge = false;}}}

以上分析的组件启动,关闭,状态监控位于org.apache.flume.lifecycle包的LifecycleSupervisor类。到此,flume组件的关闭,启动,监控分析完毕






1 0