接收连接。这里面有几个比较重要的方法:getRemoteConnectionManagerId()、processConnectionManagerId(header: MessageChunkHeader)、read()



val messages = new HashMapInt,BufferMessage





override def toString + “” + connectionManagerId.port +”” + uniqId







l C2S:用于请求blocks协议(客户端到服务器):按照目录结构

l S2C:用于请blocks协议(服务器到客户端)

frame-length (4bytes), block-id-length (4 bytes), block-id, block-data.

frame-length不包括自身长度。如果block-id-length长度为负,那么这是一个错误消息, 而不是块的数据。真正的长度是frame-length的绝对值。


override def init(blockDataManager: BlockDataManager): Unit = {    val rpcHandler = new NettyBlockRpcServer(conf.getAppId, serializer, blockDataManager)    var serverBootstrap: Option[TransportServerBootstrap] = None    var clientBootstrap: Option[TransportClientBootstrap] = None    if (authEnabled) {      serverBootstrap = Some(new SaslServerBootstrap(transportConf, securityManager))      clientBootstrap = Some(new SaslClientBootstrap(transportConf, conf.getAppId, securityManager,        securityManager.isSaslEncryptionEnabled()))    }    transportContext = new TransportContext(transportConf, rpcHandler)    clientFactory = transportContext.createClientFactory(clientBootstrap.toSeq.asJava)    server = createServer(serverBootstrap.toList)    appId = conf.getAppId    logInfo(s"Server created on ${hostName}:${server.getPort}")  }


BlockServerHandler请求从客户端和写数据块block回来的处理程序。消息应已被LineBasedFrameDecoder处理和StringDecoder首次如此channelRead0被调用一次每行(即block ID)。




override def fetchBlocks(      host: String,      port: Int,      execId: String,      blockIds: Array[String],      listener: BlockFetchingListener): Unit = {    logTrace(s"Fetch blocks from $host:$port (executor id $execId)")    try {      val blockFetchStarter = new RetryingBlockFetcher.BlockFetchStarter {        override def createAndStart(blockIds: Array[String], listener: BlockFetchingListener) {          val client = clientFactory.createClient(host, port)          new OneForOneBlockFetcher(client, appId, execId, blockIds.toArray, listener).start()        }      }      val maxRetries = transportConf.maxIORetries()      if (maxRetries > 0) {        // Note this Fetcher will correctly handle maxRetries == 0; we avoid it just in case there's        // a bug in this code. We should remove the if statement once we're sure of the stability.        new RetryingBlockFetcher(transportConf, blockFetchStarter, blockIds, listener).start()      } else {        blockFetchStarter.createAndStart(blockIds, listener)      }    } catch {      case e: Exception =>        logError("Exception while beginning fetchBlocks", e)        blockIds.foreach(listener.onBlockFetchFailure(_, e))    }  }


Property Name Default Meaning col 3 is right-aligned $1600 (local hostname) Hostname or IP address for the driver to listen on. This is used for communicating with the executors and the standalone Master. spark.driver.port (random) Port for the driver to listen on. This is used for communicating with the executors and the standalone Master. spark.fileserver.port (random) Port for the driver’s HTTP file server to listen on. spark.broadcast.port (random) Port for the driver’s HTTP broadcast server to listen on. This is not relevant for torrent broadcast. spark.replClassServer.port (random) Port for the driver’s HTTP class server to listen on. This is only relevant for the Spark shell. spark.blockManager.port (random) Port for all block managers to listen on. These exist on both the driver and the executors. spark.executor.port (random) Port for the executor to listen on. This is used for communicating with the driver. spark.port.maxRetries 16 Default maximum number of retries when binding to a port before giving up. spark.akka.frameSize 10 Maximum message size to allow in “control plane” communication (for serialized tasks and task results), in MB. Increase this if your tasks need to send back large results to the driver (e.g. using collect() on a large dataset). spark.akka.threads 4 Number of actor threads to use for communication. Can be useful to increase on large clusters when the driver has a lot of CPU cores. spark.akka.timeout 100 Communication timeout between Spark nodes, in seconds. spark.akka.heartbeat.pauses 6000 This is set to a larger value to disable failure detector that comes inbuilt akka. It can be enabled again, if you plan to use this feature (Not recommended). Acceptable heart beat pause in seconds for akka. This can be used to control sensitivity to gc pauses. Tune this in combination of spark.akka.heartbeat.interval and spark.akka.failure-detector.threshold if you need to. spark.akka.failure-detector.threshold 300.0 This is set to a larger value to disable failure detector that comes inbuilt akka. It can be enabled again, if you plan to use this feature (Not recommended). This maps to akka’s akka.remote.transport-failure-detector.threshold. Tune this in combination of spark.akka.heartbeat.pauses and spark.akka.heartbeat.interval if you need to. spark.akka.heartbeat.interval 1000 This is set to a larger value to disable failure detector that comes inbuilt akka. It can be enabled again, if you plan to use this feature (Not recommended). A larger interval value in seconds reduces network overhead and a smaller value ( ~ 1 s) might be more informative for akka’s failure detector. Tune this in combination of spark.akka.heartbeat.pauses and spark.akka.failure-detector.threshold if you need to. Only positive use case for using failure detector can be, a sensistive failure detector can help evict rogue executors really quick. However this is usually not the case as gc pauses and network lags are expected in a real Spark cluster. Apart from that enabling this leads to a lot of exchanges of heart beats between nodes leading to flooding the network with those.
