第25课:Spark Streaming的StreamingContext启动及JobScheduler启动源码图解

来源:互联网 发布:json对象是什么意思 编辑:程序博客网 时间:2024/06/05 09:37

本期内容:

1. StreamingContext启动源码图解

2. JobScheduler启动源码图解

 

StreamingContext的start方法对INITIALIZED、ACTIVE、STOPPED等状态分别做不同处理。

 

/** * Start the execution of the streams. * * @throws IllegalStateException if the StreamingContext is already stopped. */def start(): Unit = synchronized {  state match {    case INITIALIZED =>      startSite.set(DStream.getCreationSite())      StreamingContext.ACTIVATION_LOCK.synchronized {        StreamingContext.assertNoOtherContextIsActive()        try {          validate()          // Start the streaming scheduler in a new thread, so that thread local properties          // like call sites and job groups can be reset without affecting those of the          // current thread.          ThreadUtils.runInNewThread("streaming-start") {            sparkContext.setCallSite(startSite.get)            sparkContext.clearJobGroup()            sparkContext.setLocalProperty(SparkContext.SPARK_JOB_INTERRUPT_ON_CANCEL, "false")            scheduler.start()          }          state = StreamingContextState.ACTIVE        } catch {          case NonFatal(e) =>            logError("Error starting the context, marking it as stopped", e)            scheduler.stop(false)            state = StreamingContextState.STOPPED            throw e        }        StreamingContext.setActiveContext(this)      }      shutdownHookRef = ShutdownHookManager.addShutdownHook(        StreamingContext.SHUTDOWN_HOOK_PRIORITY)(stopOnShutdown)      // Registering Streaming Metrics at the start of the StreamingContext      assert(env.metricsSystem != null)      env.metricsSystem.registerSource(streamingSource)      uiTab.foreach(_.attach())      logInfo("StreamingContext started")    case ACTIVE =>      logWarning("StreamingContext has already been started")    case STOPPED =>      throw new IllegalStateException("StreamingContext has already been stopped")  }}

 

 

 

 

JobScheduler中的eventLoop是带有一个消息队列的线程循环器。接收的消息有JobStarted、JobCompleted、ErrorReported等三种。eventLoop用start方法启动。

JobScheduler还把所有InputDStream的rateController放入监听器中。然后启动了listenerBus。

然后启动了receiverTracker、jobGenerator。这两个在JobScheduler中至关重要。

ReceiverTracker中有个适用于RPC的ReceiverTrackerEndpoint类型的消息循环体,用receive方法接收StartAllReceivers、RestartReceiver、CleanupOldBlocks、UpdateReceiverRteLimit等4种消息,用receiveAndREply方法接收RegisterReceiver、AddBlock、DeregisterReceiver、AllReceiverIds、StopAllReceivers等五种消息。消息有自己给自己发的消息,也有远程发来的消息。

shutdownHookManager用于关闭时。env.metricsSystem用于计量,注册的计量信息在StreamingSource中有定义。uiTab用于web界面。针对UI界面,定义有StreamingPage、BatchPage等WebUIPage的子类 ,做定制时也可定义自己的Page。

源码需要在细节上下工夫。深度决定广度。

JobGenerator中也有一个消息循环体。

 

所以,JobScheduler有3个消息循环体。如下图所示

 

0 0
原创粉丝点击