Spark源码学习(8)——NetWork
来源:互联网 发布:mac下完jdk怎么用 编辑:程序博客网 时间:2024/05/16 04:44
本文要解决的问题:
通过对Spark源码的分析,对它Network实现模块有更加深入的了解。
网络管理,由于分布式集群,那么无论master还是worker都离不开网络通讯。Network包位于核心源码org.apache.spark.network中。
Connection
Connection是一个抽象,它有两个子类ReceivingConnection、SendingConnection。接收连接和发送连接。
ReceivingConnection
接收连接。这里面有几个比较重要的方法:getRemoteConnectionManagerId()、processConnectionManagerId(header: MessageChunkHeader)、read()
getRemoteConnectionManagerId():获取远程连接的消息Id,这个方法调用了父类的实现。
这里面有个内部类Inbox,它是一个消息存储集合。里面有个属性
val messages = new HashMapInt,BufferMessage
所有连接到该节点的机器都会被记录到这个messages集合中。
SendingConnection
发送连接。它和ReceivingConnection恰恰相反。
ConnectionId
生成连接的ID对象。生成的原则包括:
override def toString =connectionManagerId.host + “” + connectionManagerId.port +”” + uniqId
ConnectionManager
ConnectionManager,顾名思义管理connection。里面定义定了内部类MessageStatus、配置参数还有一系列的线程池等等。
MessageStatus:消息状态,用于跟踪连接消息状态。
Netty
Server
BlockServer
BlockServer服务器提供的Spark数据块。它有两层协议:
l C2S:用于请求blocks协议(客户端到服务器):按照目录结构
l S2C:用于请blocks协议(服务器到客户端)
frame-length (4bytes), block-id-length (4 bytes), block-id, block-data.
frame-length不包括自身长度。如果block-id-length长度为负,那么这是一个错误消息, 而不是块的数据。真正的长度是frame-length的绝对值。
下面是初始化init源码:
override def init(blockDataManager: BlockDataManager): Unit = { val rpcHandler = new NettyBlockRpcServer(conf.getAppId, serializer, blockDataManager) var serverBootstrap: Option[TransportServerBootstrap] = None var clientBootstrap: Option[TransportClientBootstrap] = None if (authEnabled) { serverBootstrap = Some(new SaslServerBootstrap(transportConf, securityManager)) clientBootstrap = Some(new SaslClientBootstrap(transportConf, conf.getAppId, securityManager, securityManager.isSaslEncryptionEnabled())) } transportContext = new TransportContext(transportConf, rpcHandler) clientFactory = transportContext.createClientFactory(clientBootstrap.toSeq.asJava) server = createServer(serverBootstrap.toList) appId = conf.getAppId logInfo(s"Server created on ${hostName}:${server.getPort}") }
BlockServerHandler
BlockServerHandler请求从客户端和写数据块block回来的处理程序。消息应已被LineBasedFrameDecoder处理和StringDecoder首次如此channelRead0被调用一次每行(即block ID)。
Client
BlockFetchingClient
BlockFetchingClient从org.apache.spark.network.netty.NettyBlockTransferService抓取数据。
查看里面一个比较中要的方法:fetchBlocks。该方法向远程服务器的序列划block,并执行回调。它是异步的,并立即返回。
源码如下:
override def fetchBlocks( host: String, port: Int, execId: String, blockIds: Array[String], listener: BlockFetchingListener): Unit = { logTrace(s"Fetch blocks from $host:$port (executor id $execId)") try { val blockFetchStarter = new RetryingBlockFetcher.BlockFetchStarter { override def createAndStart(blockIds: Array[String], listener: BlockFetchingListener) { val client = clientFactory.createClient(host, port) new OneForOneBlockFetcher(client, appId, execId, blockIds.toArray, listener).start() } } val maxRetries = transportConf.maxIORetries() if (maxRetries > 0) { // Note this Fetcher will correctly handle maxRetries == 0; we avoid it just in case there's // a bug in this code. We should remove the if statement once we're sure of the stability. new RetryingBlockFetcher(transportConf, blockFetchStarter, blockIds, listener).start() } else { blockFetchStarter.createAndStart(blockIds, listener) } } catch { case e: Exception => logError("Exception while beginning fetchBlocks", e) blockIds.foreach(listener.onBlockFetchFailure(_, e)) } }
Conf
spark.akka.heartbeat.interval
and spark.akka.failure-detector.threshold
if you need to. spark.akka.failure-detector.threshold 300.0 This is set to a larger value to disable failure detector that comes inbuilt akka. It can be enabled again, if you plan to use this feature (Not recommended). This maps to akka’s akka.remote.transport-failure-detector.threshold
. Tune this in combination of spark.akka.heartbeat.pauses
and spark.akka.heartbeat.interval
if you need to. spark.akka.heartbeat.interval 1000 This is set to a larger value to disable failure detector that comes inbuilt akka. It can be enabled again, if you plan to use this feature (Not recommended). A larger interval value in seconds reduces network overhead and a smaller value ( ~ 1 s) might be more informative for akka’s failure detector. Tune this in combination of spark.akka.heartbeat.pauses
and spark.akka.failure-detector.threshold
if you need to. Only positive use case for using failure detector can be, a sensistive failure detector can help evict rogue executors really quick. However this is usually not the case as gc pauses and network lags are expected in a real Spark cluster. Apart from that enabling this leads to a lot of exchanges of heart beats between nodes leading to flooding the network with those.- Spark源码学习(8)——NetWork
- Spark源码走读8——NetWork
- Spark源码学习(2)——Spark Submit
- Spark源码学习(9)——Spark On Yarn
- Spark源码学习(10)——Spark Streaming
- Spark源码学习(1)——RDD分析
- Spark源码学习(3)——Job Runtime
- Spark源码学习(4)——Scheduler
- Spark源码学习(5)——Storage
- Spark源码学习(6)——Shuffle
- Spark源码学习(7)——Broadcast
- Spark源码学习——用IntelliJ IDEA看spark源码
- Spark源码学习——在linux环境下用IDEA看Spark源码
- Spark源码学习——在Linux环境下使用IDEA看Spark源码
- Spark源码解读(8)——累加器
- Spark源码走读——Spark Streaming
- 学习spark ml源码——线性回归
- Spark源码学习(一)---Spark的启动脚本
- linux进程切换(linux3.4.5,x86)
- 微信6.0主界面
- 给自己三年时间,从小白混到高级程序员
- 写一个web服务器
- 字符串截取
- Spark源码学习(8)——NetWork
- [从头读历史] 第260节 左传 [BC717至BC658]
- Jboss rules规则引擎 Drools 6.4.0 Final 教程(2)
- leetcode 155 Min Stack
- 解决问题的反思
- AL Spring框架学习笔记
- 双向循环链表-模板-自定义类型
- 记录
- 《JAVA并发编程实战》 - 原子变量与非阻塞同步机制