spark学习-32-SparkEnv的构造步骤
来源:互联网 发布:java导入自己的package 编辑:程序博客网 时间:2024/06/05 22:41
1.看代码
/** * Helper method to create a SparkEnv for a driver or an executor. * 辅助方法来创建一个驱动程序或执行器sparkenv。 * * SparkEnv的构造步骤如下: * 1.创建安全管理器SecurityManager * 2.创建给予AKKa的分布式消息系统ActorSystem; * 3.创建Map任务输出跟踪器mapOutputTracker; * 4.实例化ShuffleManager; * 5.创建ShuffleMemoryManager; * 6.创建块传输服务BlockTransferService; * 7.创建BlockManagerMaster; * 8.创建块管理器BlockManager; * 9.创建广播管理器BroadcastManager; * 10.创建缓存管理器CacheManager; * 11.创建HTTP文件服务器HttpFileServer; * 13.创建输出提交控制器OutputCommitCoordinator; * 14.创建SparkEnv; */ private def create( conf: SparkConf, executorId: String, bindAddress: String, advertiseAddress: String, port: Int, isLocal: Boolean, numUsableCores: Int, ioEncryptionKey: Option[Array[Byte]], listenerBus: LiveListenerBus = null, mockOutputCommitCoordinator: Option[OutputCommitCoordinator] = None): SparkEnv = { // 驱动程序的Executor的id。 val isDriver = executorId == SparkContext.DRIVER_IDENTIFIER // Listener bus is only used on the driver 监听总线仅仅用在驱动程序上 // 如果没有创建监听总线如果尝试创建驱动的SparkEnv,将会报错 if (isDriver) { assert(listenerBus != null, "Attempted to create driver SparkEnv with null listener bus!") } // ===================== 1.创建安全管理器SecurityManager ====================== // 安全管理器是什么呢?请看http://blog.csdn.net/qq_21383435/article/details/78560364 val securityManager = new SecurityManager(conf, ioEncryptionKey) ioEncryptionKey.foreach { _ => // 检查是否应启用网络加密。 if (!securityManager.isEncryptionEnabled()) { logWarning("I/O encryption enabled without RPC encryption: keys will be visible on the " + "wire.") } } /** * rpcEnv是个什么鬼? * 在SparkContext初始化环境时,初始化SparkEnv的时候使用下面代码创建RpcEnv * rpc知识点:http://blog.csdn.net/qq_21383435/article/details/78567491 */ val systemName = if (isDriver) driverSystemName else executorSystemName val rpcEnv = RpcEnv.create(systemName, bindAddress, advertiseAddress, port, conf, securityManager, clientMode = !isDriver) // Figure out which port RpcEnv actually bound to in case the original port is 0 or occupied. // In the non-driver case, the RPC env's address may be null since it may not be listening // for incoming connections. // 找出哪些端口rpcenv实际上必然在原来的端口是0或占领。在没有驱动的情况下, // RPC env的地址可能是零因为它可能不会侦听传入的连接。 if (isDriver) { conf.set("spark.driver.port", rpcEnv.address.port.toString) } else if (rpcEnv.address != null) { conf.set("spark.executor.port", rpcEnv.address.port.toString) logInfo(s"Setting spark.executor.port to: ${rpcEnv.address.port.toString}") } // Create an instance of the class with the given name, possibly initializing it with our conf // 根据指定的名称创建一个类的实例,尽可能根据我们的配置初始化 def instantiateClass[T](className: String): T = { val cls = Utils.classForName(className) // Look for a constructor taking a SparkConf and a boolean isDriver, then one taking just // SparkConf, then one taking no arguments // 寻找一个构造函数以sparkconf和isdriver的布尔值,然后仅仅告诉SparkConf,然后连接没有传递参数 try { cls.getConstructor(classOf[SparkConf], java.lang.Boolean.TYPE) .newInstance(conf, new java.lang.Boolean(isDriver)) .asInstanceOf[T] } catch { case _: NoSuchMethodException => try { cls.getConstructor(classOf[SparkConf]).newInstance(conf).asInstanceOf[T] } catch { case _: NoSuchMethodException => cls.getConstructor().newInstance().asInstanceOf[T] } } } // Create an instance of the class named by the given SparkConf property, or defaultClassName // if the property is not set, possibly initializing it with our conf // 根据指定的SparkConf的配置创建这个类的实例,或使用defaultclassname如果没有设置的属性, // 尽可能根据我们的配置初始化 def instantiateClassFromConf[T](propertyName: String, defaultClassName: String): T = { instantiateClass[T](conf.get(propertyName, defaultClassName)) } val serializer = instantiateClassFromConf[Serializer]( "spark.serializer", "org.apache.spark.serializer.JavaSerializer") logDebug(s"Using serializer: ${serializer.getClass}") // ============================= 创建序列化管理器 SerializerManager =================== // http://blog.csdn.net/qq_21383435/article/details/78581511 val serializerManager = new SerializerManager(serializer, conf, ioEncryptionKey) val closureSerializer = new JavaSerializer(conf) def registerOrLookupEndpoint( name: String, endpointCreator: => RpcEndpoint): RpcEndpointRef = { if (isDriver) { logInfo("Registering " + name) rpcEnv.setupEndpoint(name, endpointCreator) } else { RpcUtils.makeDriverRef(name, conf, rpcEnv) } } // ============================== 9.创建广播管理器BroadcastManager;====================== // BroadcastManager用于将配置信息和序列化后的RDD,job以及ShuffleDependency等信息在本地存储。 // 如果为了容灾,也会复制到其他节点上。 // spark学习-34-Spark的BroadcastManager广播管理 // http://blog.csdn.net/qq_21383435/article/details/78592022 val broadcastManager = new BroadcastManager(isDriver, conf, securityManager) // =============================3.创建Map任务输出跟踪器mapOutputTracker;================== /* mapOutputTracker用于跟踪map阶段任务的输出状态,此状态便于reduce阶段任务获取地址以及中间输出结果。每个map任务或者 reduce任务都会有唯一的标识。分别为mapId和reduceId.每个reduce任务的输入可能是多个map任务的输出,reduce会到各个map 任务的所有节点上拉去Block,这一过程交shuffle,每批shuffle过程都有唯一的表示shuffleId。 详情:http://blog.csdn.net/qq_21383435/article/details/78603123 */ val mapOutputTracker = if (isDriver) { new MapOutputTrackerMaster(conf, broadcastManager, isLocal) } else { new MapOutputTrackerWorker(conf) } // Have to assign trackerEndpoint after initialization as MapOutputTrackerEndpoint // requires the MapOutputTracker itself /** 必须在初始化后指定trackerEndpoint,因为MapOutputTrackerEndpoint需要MapOutputTracker */ mapOutputTracker.trackerEndpoint = registerOrLookupEndpoint(MapOutputTracker.ENDPOINT_NAME, new MapOutputTrackerMasterEndpoint( rpcEnv, mapOutputTracker.asInstanceOf[MapOutputTrackerMaster], conf)) // Let the user specify short names for shuffle managers // 让用户给shuffle managers指定一个短名称 val shortShuffleMgrNames = Map( "sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName, "tungsten-sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName) val shuffleMgrName = conf.get("spark.shuffle.manager", "sort") val shuffleMgrClass = shortShuffleMgrNames.getOrElse(shuffleMgrName.toLowerCase(Locale.ROOT), shuffleMgrName) // =========================创建ShuffleManager============================================ /** ShuffleManager负责管理本地及远程的block数据的Shuffle操作。详情:http://blog.csdn.net/qq_21383435/article/details/78634471*/ val shuffleManager = instantiateClass[ShuffleManager](shuffleMgrClass) // =======================创建MemoryManager================================================================== /** * 根据 spark.memory.useLegacyMode 值的不同,会创建 MemoryManager 不同子类的实例: * 值为 false:创建 UnifiedMemoryManager 类实例,该类为新的内存管理模块的实现 * 值为 true:创建 StaticMemoryManager类实例,该类为1.6版本以前旧的内存管理模块的实现 * MemoryManager 具体查看:http://blog.csdn.net/qq_21383435/article/details/78639582 * */ val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false) val memoryManager: MemoryManager = if (useLegacyMemoryManager) { // 如果还是采用之前的方式,则使用StaticMemoryManager内存管理模型,即静态内存管理 new StaticMemoryManager(conf, numUsableCores) } else { // 否则,使用最新的UnifiedMemoryManager内存管理模型,即统一内存管理模型 // 我们再看下UnifiedMemoryManager,即统一内存管理器。在SparkEnv中,它是通过如下方式完成初始化的: // 读者这里可能有疑问了,为什么没有new关键字呢?这正是scala语言的特点。 // 它其实是通过UnifiedMemoryManager类的apply()方法完成初始化的。 UnifiedMemoryManager(conf, numUsableCores) } val blockManagerPort = if (isDriver) { conf.get(DRIVER_BLOCK_MANAGER_PORT) } else { conf.get(BLOCK_MANAGER_PORT) } // =================================.创建块传输服务BlockTransferService;=========================== /* blockTransferService默认为NettyBlockTransferService(可以配置属相spark.shuffle.blockTransferService使用NioBlockTransferService) ,它使用Netty法人一步时间驱动的网络应用框架,提供web服务及客户端,获取远程节点上的Block集合。 详情:http://blog.csdn.net/qq_21383435/article/details/78645708 */ val blockTransferService = new NettyBlockTransferService(conf, securityManager, bindAddress, advertiseAddress, blockManagerPort, numUsableCores) // ================================7.创建BlockManagerMaster; ======================== val blockManagerMaster = new BlockManagerMaster(registerOrLookupEndpoint( BlockManagerMaster.DRIVER_ENDPOINT_NAME, new BlockManagerMasterEndpoint(rpcEnv, isLocal, conf, listenerBus)), conf, isDriver) // =========================创建BlockManager================================================== // NB: blockManager is not valid until initialize() is called later. // BlockManager负责对Block的管理,只有在BlockManager的痴实话方法initialize被调用之后,它才是有效的。 // Blockmanager作为存储系统的一部分。 val blockManager = new BlockManager(executorId, rpcEnv, blockManagerMaster, serializerManager, conf, memoryManager, mapOutputTracker, shuffleManager, blockTransferService, securityManager, numUsableCores) // =======================创建测量系统MetricsSystem==================================================== /** createMetricsSystem方法主要调用了new MetricsSystem(instance, conf, securityMgr)方法 这里val isDriver = executorId == SparkContext.DRIVER_IDENTIFIER 而SparkContext.DRIVER_IDENTIFIER的值是driver 如果executorId也是driver,那么isDriver就为真,创建的是driver的监测系统,否则就是创建executor的监测系统 详情:http://blog.csdn.net/qq_21383435/article/details/78659478 */ val metricsSystem = if (isDriver) { // Don't start metrics system right now for Driver. // We need to wait for the task scheduler to give us an app ID. // Then we can start the metrics system. // 现在不要为Driver启动metrics system,我们需要task scheduler任务调度器给我们一个APP id, // 在这之后再启动 metrics system. MetricsSystem.createMetricsSystem("driver", conf, securityManager) } else { // We need to set the executor ID before the MetricsSystem is created because sources and // sinks specified in the metrics configuration file will want to incorporate this executor's // ID into the metrics they report. // 我们需要设置的executor 的ID在创建MetricsSystem之前,因为sources和sinks的标准配置文件指定了 // 将要把这个 executor 的ID 传递到metrics的报告中。 conf.set("spark.executor.id", executorId) val ms = MetricsSystem.createMetricsSystem("executor", conf, securityManager) ms.start() ms }// =======================创建输出提交控制器OutputCommitCoordinator====================================================// 详情:http://blog.csdn.net/qq_21383435/article/details/78662063 val outputCommitCoordinator = mockOutputCommitCoordinator.getOrElse { new OutputCommitCoordinator(conf, isDriver) } val outputCommitCoordinatorRef = registerOrLookupEndpoint("OutputCommitCoordinator", new OutputCommitCoordinatorEndpoint(rpcEnv, outputCommitCoordinator)) outputCommitCoordinator.coordinatorRef = Some(outputCommitCoordinatorRef) // =============================创建SparkEnv=================================================== /** serializer和closureSerializer都是使用Class.forName反射生成的org.apache.spark.serializer.JavaSerializer类的 实例。其中closureSerializer实例特别用来对Scala中的闭包进行序列化。 */ val envInstance = new SparkEnv( executorId, rpcEnv, serializer, closureSerializer, serializerManager, mapOutputTracker, shuffleManager, broadcastManager, blockManager, securityManager, metricsSystem, memoryManager, outputCommitCoordinator, conf) // Add a reference to tmp dir created by driver, we will delete this tmp dir when stop() is // called, and we only need to do it for driver. Because driver may run as a service, and if we // don't delete this tmp dir when sc is stopped, then will create too many tmp dirs. // driver创建一个临时目录的引用。我们将删除这些临时文件在stop()被调用之后而且我们仅仅需要为driver做这些。 // 因为driver是运行一个服务,如果我们在sc停止后不删除这些临时文件夹的话,他将会创建很多这样的临时文件夹 if (isDriver) { val sparkFilesDir = Utils.createTempDir(Utils.getLocalDir(conf), "userFiles").getAbsolutePath envInstance.driverTmpDir = Some(sparkFilesDir) } envInstance }
阅读全文
0 0
- spark学习-32-SparkEnv的构造步骤
- Spark源码学习笔记4-SparkEnv
- Spark中RpcEnv和SparkEnv的区别
- SparkEnv
- (二)spark源码理解之SparkEnv
- spark源码阅读(十四)---sparkEnv类
- spark源码之sparkEnv(2)blockManager
- spark源码之sparkEnv(1)RPC通信
- spark机器学习中安装ipython步骤
- 深入理解Spark 2.1 Core (十三):sparkEnv类源码分析
- Sequoiadb与Spark的对接步骤
- Spark R安装成功的步骤
- Spark R安装成功的步骤
- C++ 构造函数的具体执行步骤
- spark shell的学习
- spark shell的学习
- 学习spark的网站
- 学习spark的动力
- Hive Metastore 和hive-server2配置
- NOIP2017普及复赛 (~~题解)
- 单片机编程规范,模块化编程
- Android 自定义Camera全屏拍照,支持前后摄像头
- 内联框架跳转页面
- spark学习-32-SparkEnv的构造步骤
- MyBatis框架
- PythonStock(12):使用python,pandas进行股票分析
- Java学习——StringBuffer和StringBuilder类
- HDU1970 John(经典尼姆博奕)
- JS时间格式化
- 16秋计算机JAVA第五节课作业
- Linux 企业运维人员最常用 150 个命令汇总! 收藏了!
- 从头梳理 Java 的诞生,语言特点-阅读笔记