spark学习-32-SparkEnv的构造步骤

来源：互联网发布：java导入自己的package 编辑：程序博客网时间：2024/06/05 22:41
1.看代码
 /**   * Helper method to create a SparkEnv for a driver or an executor.    * 辅助方法来创建一个驱动程序或执行器sparkenv。    *    * SparkEnv的构造步骤如下：    *     1.创建安全管理器SecurityManager    *     2.创建给予AKKa的分布式消息系统ActorSystem;    *     3.创建Map任务输出跟踪器mapOutputTracker;    *     4.实例化ShuffleManager;    *     5.创建ShuffleMemoryManager;    *     6.创建块传输服务BlockTransferService;    *     7.创建BlockManagerMaster;    *     8.创建块管理器BlockManager;    *     9.创建广播管理器BroadcastManager;    *     10.创建缓存管理器CacheManager;    *     11.创建HTTP文件服务器HttpFileServer;    *     13.创建输出提交控制器OutputCommitCoordinator;    *     14.创建SparkEnv;   */  private def create(      conf: SparkConf,      executorId: String,      bindAddress: String,      advertiseAddress: String,      port: Int,      isLocal: Boolean,      numUsableCores: Int,      ioEncryptionKey: Option[Array[Byte]],      listenerBus: LiveListenerBus = null,      mockOutputCommitCoordinator: Option[OutputCommitCoordinator] = None): SparkEnv = {    // 驱动程序的Executor的id。    val isDriver = executorId == SparkContext.DRIVER_IDENTIFIER    // Listener bus is only used on the driver 监听总线仅仅用在驱动程序上    // 如果没有创建监听总线如果尝试创建驱动的SparkEnv，将会报错    if (isDriver) {      assert(listenerBus != null, "Attempted to create driver SparkEnv with null listener bus!")    }    // ===================== 1.创建安全管理器SecurityManager ======================    // 安全管理器是什么呢？请看http://blog.csdn.net/qq_21383435/article/details/78560364    val securityManager = new SecurityManager(conf, ioEncryptionKey)    ioEncryptionKey.foreach { _ =>      // 检查是否应启用网络加密。      if (!securityManager.isEncryptionEnabled()) {        logWarning("I/O encryption enabled without RPC encryption: keys will be visible on the " +          "wire.")      }    }    /**      * rpcEnv是个什么鬼？      * 在SparkContext初始化环境时，初始化SparkEnv的时候使用下面代码创建RpcEnv      * rpc知识点:http://blog.csdn.net/qq_21383435/article/details/78567491      */    val systemName = if (isDriver) driverSystemName else executorSystemName    val rpcEnv = RpcEnv.create(systemName, bindAddress, advertiseAddress, port, conf,      securityManager, clientMode = !isDriver)    // Figure out which port RpcEnv actually bound to in case the original port is 0 or occupied.    // In the non-driver case, the RPC env's address may be null since it may not be listening    // for incoming connections.    // 找出哪些端口rpcenv实际上必然在原来的端口是0或占领。在没有驱动的情况下，    // RPC env的地址可能是零因为它可能不会侦听传入的连接。    if (isDriver) {      conf.set("spark.driver.port", rpcEnv.address.port.toString)    } else if (rpcEnv.address != null) {      conf.set("spark.executor.port", rpcEnv.address.port.toString)      logInfo(s"Setting spark.executor.port to: ${rpcEnv.address.port.toString}")    }    // Create an instance of the class with the given name, possibly initializing it with our conf    // 根据指定的名称创建一个类的实例，尽可能根据我们的配置初始化    def instantiateClass[T](className: String): T = {      val cls = Utils.classForName(className)      // Look for a constructor taking a SparkConf and a boolean isDriver, then one taking just      // SparkConf, then one taking no arguments      // 寻找一个构造函数以sparkconf和isdriver的布尔值，然后仅仅告诉SparkConf，然后连接没有传递参数      try {        cls.getConstructor(classOf[SparkConf], java.lang.Boolean.TYPE)          .newInstance(conf, new java.lang.Boolean(isDriver))          .asInstanceOf[T]      } catch {        case _: NoSuchMethodException =>          try {            cls.getConstructor(classOf[SparkConf]).newInstance(conf).asInstanceOf[T]          } catch {            case _: NoSuchMethodException =>              cls.getConstructor().newInstance().asInstanceOf[T]          }      }    }    // Create an instance of the class named by the given SparkConf property, or defaultClassName    // if the property is not set, possibly initializing it with our conf    // 根据指定的SparkConf的配置创建这个类的实例，或使用defaultclassname如果没有设置的属性，    // 尽可能根据我们的配置初始化    def instantiateClassFromConf[T](propertyName: String, defaultClassName: String): T = {      instantiateClass[T](conf.get(propertyName, defaultClassName))    }    val serializer = instantiateClassFromConf[Serializer](      "spark.serializer", "org.apache.spark.serializer.JavaSerializer")    logDebug(s"Using serializer: ${serializer.getClass}")    // ============================= 创建序列化管理器 SerializerManager ===================    // http://blog.csdn.net/qq_21383435/article/details/78581511    val serializerManager = new SerializerManager(serializer, conf, ioEncryptionKey)    val closureSerializer = new JavaSerializer(conf)    def registerOrLookupEndpoint(        name: String, endpointCreator: => RpcEndpoint):      RpcEndpointRef = {      if (isDriver) {        logInfo("Registering " + name)        rpcEnv.setupEndpoint(name, endpointCreator)      } else {        RpcUtils.makeDriverRef(name, conf, rpcEnv)      }    }    // ============================== 9.创建广播管理器BroadcastManager;======================    // BroadcastManager用于将配置信息和序列化后的RDD,job以及ShuffleDependency等信息在本地存储。    // 如果为了容灾，也会复制到其他节点上。    // spark学习-34-Spark的BroadcastManager广播管理    // http://blog.csdn.net/qq_21383435/article/details/78592022    val broadcastManager = new BroadcastManager(isDriver, conf, securityManager)    // =============================3.创建Map任务输出跟踪器mapOutputTracker;==================    /*        mapOutputTracker用于跟踪map阶段任务的输出状态，此状态便于reduce阶段任务获取地址以及中间输出结果。每个map任务或者        reduce任务都会有唯一的标识。分别为mapId和reduceId.每个reduce任务的输入可能是多个map任务的输出，reduce会到各个map        任务的所有节点上拉去Block，这一过程交shuffle，每批shuffle过程都有唯一的表示shuffleId。        详情：http://blog.csdn.net/qq_21383435/article/details/78603123     */    val mapOutputTracker = if (isDriver) {      new MapOutputTrackerMaster(conf, broadcastManager, isLocal)    } else {      new MapOutputTrackerWorker(conf)    }    // Have to assign trackerEndpoint after initialization as MapOutputTrackerEndpoint    // requires the MapOutputTracker itself    /** 必须在初始化后指定trackerEndpoint，因为MapOutputTrackerEndpoint需要MapOutputTracker */    mapOutputTracker.trackerEndpoint = registerOrLookupEndpoint(MapOutputTracker.ENDPOINT_NAME,      new MapOutputTrackerMasterEndpoint(        rpcEnv, mapOutputTracker.asInstanceOf[MapOutputTrackerMaster], conf))    // Let the user specify short names for shuffle managers    // 让用户给shuffle managers指定一个短名称    val shortShuffleMgrNames = Map(      "sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName,      "tungsten-sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName)    val shuffleMgrName = conf.get("spark.shuffle.manager", "sort")    val shuffleMgrClass =      shortShuffleMgrNames.getOrElse(shuffleMgrName.toLowerCase(Locale.ROOT), shuffleMgrName)    // =========================创建ShuffleManager============================================    /** ShuffleManager负责管理本地及远程的block数据的Shuffle操作。详情：http://blog.csdn.net/qq_21383435/article/details/78634471*/    val shuffleManager = instantiateClass[ShuffleManager](shuffleMgrClass)    // =======================创建MemoryManager==================================================================    /**      * 根据 spark.memory.useLegacyMode 值的不同，会创建 MemoryManager 不同子类的实例：      * 值为 false：创建 UnifiedMemoryManager 类实例，该类为新的内存管理模块的实现      * 值为 true：创建 StaticMemoryManager类实例，该类为1.6版本以前旧的内存管理模块的实现      * MemoryManager 具体查看：http://blog.csdn.net/qq_21383435/article/details/78639582      * */    val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)    val memoryManager: MemoryManager =      if (useLegacyMemoryManager) {        // 如果还是采用之前的方式，则使用StaticMemoryManager内存管理模型，即静态内存管理        new StaticMemoryManager(conf, numUsableCores)      } else {        // 否则，使用最新的UnifiedMemoryManager内存管理模型，即统一内存管理模型        // 我们再看下UnifiedMemoryManager，即统一内存管理器。在SparkEnv中，它是通过如下方式完成初始化的：        // 读者这里可能有疑问了，为什么没有new关键字呢？这正是scala语言的特点。        // 它其实是通过UnifiedMemoryManager类的apply()方法完成初始化的。        UnifiedMemoryManager(conf, numUsableCores)      }    val blockManagerPort = if (isDriver) {      conf.get(DRIVER_BLOCK_MANAGER_PORT)    } else {      conf.get(BLOCK_MANAGER_PORT)    }    // =================================.创建块传输服务BlockTransferService;===========================    /*        blockTransferService默认为NettyBlockTransferService（可以配置属相spark.shuffle.blockTransferService使用NioBlockTransferService）        ,它使用Netty法人一步时间驱动的网络应用框架，提供web服务及客户端，获取远程节点上的Block集合。        详情：http://blog.csdn.net/qq_21383435/article/details/78645708     */    val blockTransferService =      new NettyBlockTransferService(conf, securityManager, bindAddress, advertiseAddress,        blockManagerPort, numUsableCores)    // ================================7.创建BlockManagerMaster; ========================    val blockManagerMaster = new BlockManagerMaster(registerOrLookupEndpoint(      BlockManagerMaster.DRIVER_ENDPOINT_NAME,      new BlockManagerMasterEndpoint(rpcEnv, isLocal, conf, listenerBus)),      conf, isDriver)    // =========================创建BlockManager==================================================    // NB: blockManager is not valid until initialize() is called later.    // BlockManager负责对Block的管理，只有在BlockManager的痴实话方法initialize被调用之后，它才是有效的。    // Blockmanager作为存储系统的一部分。    val blockManager = new BlockManager(executorId, rpcEnv, blockManagerMaster,      serializerManager, conf, memoryManager, mapOutputTracker, shuffleManager,      blockTransferService, securityManager, numUsableCores)    // =======================创建测量系统MetricsSystem====================================================    /**        createMetricsSystem方法主要调用了new MetricsSystem(instance, conf, securityMgr)方法         这里val isDriver = executorId == SparkContext.DRIVER_IDENTIFIER 而SparkContext.DRIVER_IDENTIFIER的值是driver      如果executorId也是driver,那么isDriver就为真，创建的是driver的监测系统，否则就是创建executor的监测系统      详情：http://blog.csdn.net/qq_21383435/article/details/78659478     */    val metricsSystem = if (isDriver) {      // Don't start metrics system right now for Driver.      // We need to wait for the task scheduler to give us an app ID.      // Then we can start the metrics system.      // 现在不要为Driver启动metrics system，我们需要task scheduler任务调度器给我们一个APP id，      // 在这之后再启动 metrics system.      MetricsSystem.createMetricsSystem("driver", conf, securityManager)    } else {      // We need to set the executor ID before the MetricsSystem is created because sources and      // sinks specified in the metrics configuration file will want to incorporate this executor's      // ID into the metrics they report.      // 我们需要设置的executor 的ID在创建MetricsSystem之前，因为sources和sinks的标准配置文件指定了      // 将要把这个 executor 的ID 传递到metrics的报告中。      conf.set("spark.executor.id", executorId)      val ms = MetricsSystem.createMetricsSystem("executor", conf, securityManager)      ms.start()      ms    }// =======================创建输出提交控制器OutputCommitCoordinator====================================================// 详情：http://blog.csdn.net/qq_21383435/article/details/78662063    val outputCommitCoordinator = mockOutputCommitCoordinator.getOrElse {      new OutputCommitCoordinator(conf, isDriver)    }    val outputCommitCoordinatorRef = registerOrLookupEndpoint("OutputCommitCoordinator",      new OutputCommitCoordinatorEndpoint(rpcEnv, outputCommitCoordinator))    outputCommitCoordinator.coordinatorRef = Some(outputCommitCoordinatorRef)    // =============================创建SparkEnv===================================================    /**      serializer和closureSerializer都是使用Class.forName反射生成的org.apache.spark.serializer.JavaSerializer类的      实例。其中closureSerializer实例特别用来对Scala中的闭包进行序列化。      */    val envInstance = new SparkEnv(      executorId,      rpcEnv,      serializer,      closureSerializer,      serializerManager,      mapOutputTracker,      shuffleManager,      broadcastManager,      blockManager,      securityManager,      metricsSystem,      memoryManager,      outputCommitCoordinator,      conf)    // Add a reference to tmp dir created by driver, we will delete this tmp dir when stop() is    // called, and we only need to do it for driver. Because driver may run as a service, and if we    // don't delete this tmp dir when sc is stopped, then will create too many tmp dirs.    // driver创建一个临时目录的引用。我们将删除这些临时文件在stop()被调用之后而且我们仅仅需要为driver做这些。    // 因为driver是运行一个服务，如果我们在sc停止后不删除这些临时文件夹的话，他将会创建很多这样的临时文件夹    if (isDriver) {      val sparkFilesDir = Utils.createTempDir(Utils.getLocalDir(conf), "userFiles").getAbsolutePath      envInstance.driverTmpDir = Some(sparkFilesDir)    }    envInstance  }
阅读全文
0 0