spark 1.6.0 core源码分析4 worker启动流程
来源:互联网 发布:三毛梦里花落知多少诗 编辑:程序博客网 时间:2024/05/16 07:04
worker的main方法,与master类似,创建sparkConf,参数解析,以及构造worker对象并创建rpcEnv用于对外或者本身的信息交互。
private[deploy] object Worker extends Logging { val SYSTEM_NAME = "sparkWorker" val ENDPOINT_NAME = "Worker" def main(argStrings: Array[String]) { SignalLogger.register(log) val conf = new SparkConf val args = new WorkerArguments(argStrings, conf) val rpcEnv = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, args.cores, args.memory, args.masters, args.workDir, conf = conf) rpcEnv.awaitTermination() } def startRpcEnvAndEndpoint( host: String, port: Int, webUiPort: Int, cores: Int, memory: Int, masterUrls: Array[String], workDir: String, workerNumber: Option[Int] = None, conf: SparkConf = new SparkConf): RpcEnv = { // The LocalSparkCluster runs multiple local sparkWorkerX RPC Environments val systemName = SYSTEM_NAME + workerNumber.map(_.toString).getOrElse("") val securityMgr = new SecurityManager(conf) val rpcEnv = RpcEnv.create(systemName, host, port, conf, securityMgr) val masterAddresses = masterUrls.map(RpcAddress.fromSparkURL(_)) rpcEnv.setupEndpoint(ENDPOINT_NAME, new Worker(rpcEnv, webUiPort, cores, memory, masterAddresses, systemName, ENDPOINT_NAME, workDir, conf, securityMgr)) rpcEnv }
同样的执行onstart方法想master注册
override def onStart() { assert(!registered) logInfo("Starting Spark worker %s:%d with %d cores, %s RAM".format( host, port, cores, Utils.megabytesToString(memory))) logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}") logInfo("Spark home: " + sparkHome) <strong>createWorkDir() //创建工作目录</strong> shuffleService.startIfEnabled()//是否额外的启动一个shuffle服务,确保被executor所读写的shuffle文件在executor退出后被保存,可配 webUi = new WorkerWebUI(this, workDir, webUiPort) webUi.bind() <strong>registerWithMaster() //向master注册</strong> metricsSystem.registerSource(workerSource) metricsSystem.start() // Attach the worker metrics servlet handler to the web ui after the metrics system is started. metricsSystem.getServletHandlers.foreach(webUi.attachHandler) }
private def registerWithMaster() { // onDisconnected may be triggered multiple times, so don't attempt registration // if there are outstanding registration attempts scheduled. registrationRetryTimer match { case None => registered = false //这里向所有的master rpcEnv发送RegisterWorker消息,上几节有讲master收到该消息后,如果成功处理会反馈RegisteredWorker消息,不成功会发送RegisterWorkerFailed消息 registerMasterFutures = tryRegisterAllMasters() connectionAttemptCount = 0 //这里在一定时间之后会进入ReregisterWithMaster,里面会判断是否已注册,如果没有会再次发送注册信息。这个是否注册的状态是由master反馈回来的 registrationRetryTimer = Some(forwordMessageScheduler.scheduleAtFixedRate( new Runnable { override def run(): Unit = Utils.tryLogNonFatalError { Option(self).foreach(_.send(ReregisterWithMaster)) } }, INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS, INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS, TimeUnit.SECONDS)) case Some(_) => logInfo("Not spawning another attempt to register with the master, since there is an" + " attempt scheduled already.") } }
看worker收到master的RegisteredWorker消息,要注册时并不知道哪台是主,哪台是备,所以向所有配置的master都发送注册信息。主备都收到worker的注册信息之后,只有主才会反馈,并带上自己的masterUrl信息,worker以此来认定主master的rpcEnv用于真正的信息交互
worker要通过心跳来保持与master的时刻连通,所以注册成功之后,有一个connected标记是否连接正常,在changeMaster方法内部设置connected = true
private def tryRegisterAllMasters(): Array[JFuture[_]] = { masterRpcAddresses.map { masterAddress => registerMasterThreadPool.submit(new Runnable { override def run(): Unit = { try { logInfo("Connecting to master " + masterAddress + "...") val masterEndpoint = rpcEnv.setupEndpointRef(Master.SYSTEM_NAME, masterAddress, Master.ENDPOINT_NAME) <strong> registerWithMaster(masterEndpoint)</strong> } catch { case ie: InterruptedException => // Cancelled case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e) } } }) } }<pre name="code" class="java"> private def registerWithMaster(masterEndpoint: RpcEndpointRef): Unit = { masterEndpoint.ask[RegisterWorkerResponse](RegisterWorker( workerId, host, port, self, cores, memory, webUi.boundPort, publicAddress)) .onComplete { // This is a very fast action so we can use "ThreadUtils.sameThread" case Success(msg) => Utils.tryLogNonFatalError { <strong>handleRegisterResponse(msg)</strong> } case Failure(e) => logError(s"Cannot register with master: ${masterEndpoint.address}", e) System.exit(1) }(ThreadUtils.sameThread) }
case RegisteredWorker(masterRef, masterWebUiUrl) => logInfo("Successfully registered with master " + masterRef.address.toSparkURL) registered = true <strong>//注册成功</strong> changeMaster(masterRef, masterWebUiUrl) //这里是将主master的信息保存 forwordMessageScheduler.scheduleAtFixedRate(new Runnable { //在注册成功之后,才开启定时器向master发送心跳 override def run(): Unit = Utils.tryLogNonFatalError { self.send(SendHeartbeat) //每4分钟发送一次心跳到master Send a heartbeat every (heartbeat timeout) / 4 milliseconds</strong> } }, 0, HEARTBEAT_MILLIS, TimeUnit.MILLISECONDS) if (CLEANUP_ENABLED) { logInfo( s"Worker cleanup enabled; old application directories will be deleted in: $workDir") forwordMessageScheduler.scheduleAtFixedRate(new Runnable {//定时器清理workDir下很久都没有更新的且app也不在执行状态的目录 override def run(): Unit = Utils.tryLogNonFatalError { self.send(WorkDirCleanup) } }, CLEANUP_INTERVAL_MILLIS, CLEANUP_INTERVAL_MILLIS, TimeUnit.MILLISECONDS) }
如果收到RegisterWorkerFailed消息,则退出
下面看master接受到worker的心跳之后如何处理
0 0
- spark 1.6.0 core源码分析4 worker启动流程
- spark core源码分析4 worker启动流程
- spark core源码分析4 worker启动流程
- spark 1.6.0 core源码分析2 master启动流程
- Spark集群启动之Master、Worker启动流程源码分析
- spark源码分析Master与Worker启动流程篇
- spark core源码分析2 master启动流程
- spark core源码分析2 master启动流程
- spark core源码分析2 master启动流程
- Spark源码分析-worker
- spark源码学习(三)---worker源码分析-worker启动driver、executor分析
- Spark的Master和Worker集群启动的源码分析
- Spark源码分析之worker节点启动driver和executor
- Spark源码分析之Worker启动通信机制
- Spark源码分析之Worker
- Spark源码分析之Worker
- Spark源码分析之Worker
- spark 1.6.0 core源码分析5 spark提交框架
- css3动画,一张背景图两行图片
- Spring-3:bean的属性配置细节
- objdump, nm, ar
- pyspark 读写lzo 文件例子
- js测试单选按钮
- spark 1.6.0 core源码分析4 worker启动流程
- 操作HTML数据,CSS选择器
- 腾讯2016研发工程师在线模拟笔试题----32位系统中,定义**a[3][4],则变量占用内存空间为()。
- 5-1 然后是几点 (15分)7月7号
- 嘿嘿
- Android之自定义组合控件
- js 生成 yyyy-mm-dd 格式
- 拉力赛 (Standard IO)
- OpenResty(nginx扩展)实现防cc攻击