spark学习-55-源代码:SparkSession的的创建

来源:互联网 发布:时时彩组六杀号软件 编辑:程序博客网 时间:2024/05/29 12:36

1。首先我们在自己的程序中创建SparkSession

  spark= SparkSession.builder()                      .appName("lcc_java_habase_local")                      .master("local[4]")                    .getOrCreate();  

2。我们看看这一句 做了什么

   /**   * Creates a [[SparkSession.Builder]] for constructing a [[SparkSession]].   *   * @since 2.0.0   */  def builder(): Builder = new Builder

3。看看Builder这个类,这个类是SparkSession的伴生对象object SparkSession内部的一个内部类class Builder extends Logging

加载了一些外部的规则

    // spark的一些外部的扩展点(分析规则,检查分析规则,优化器规则,规划策略,自定义解析器,(外部)目录侦听器)    private[this] val extensions = new SparkSessionExtensions

主要方法1:启用Hive支持

 /**     * Enables Hive support, including connectivity to a persistent Hive metastore, support for     * Hive serdes, and Hive user-defined functions.      *      * 启用hive支持,包括连接到一个持久的hive元数据,支持hive serdes,以及hive的定义函数。     *  这里是否启用hive     * @since 2.0.0     */    def enableHiveSupport(): Builder = synchronized {      if (hiveClassesArePresent) {        config(CATALOG_IMPLEMENTATION.key, "hive")      } else {        throw new IllegalArgumentException(          "Unable to instantiate SparkSession with Hive support because " +            "Hive classes are not found.")      }    }

主要方法2:创建SparkSession

/**     * Gets an existing [[SparkSession]] or, if there is no existing one, creates a new     * one based on the options set in this builder.     *     * This method first checks whether there is a valid thread-local SparkSession,     * and if yes, return that one. It then checks whether there is a valid global     * default SparkSession, and if yes, return that one. If no valid global default     * SparkSession exists, the method creates a new SparkSession and assigns the     * newly created SparkSession as the global default.     *     * In case an existing SparkSession is returned, the config options specified in     * this builder will be applied to the existing SparkSession.      *      * 得到一个已经存在的[[SparkSession]] 或者 如果没有一个已经存在的[[SparkSession]],那么就根据设置的选项重新创建一个。      *      * 这个方法首先检查是否有一个有效的线程本地SparkSession,如果是,返回那个。      * 然后它检查是否有一个有效的全局默认SparkSession,如果是,返回那个。      * 如果没有有效的全局缺省SparkSession,该方法将创建一个新的SparkSession,并将新创建的SparkSession分配为全局默认值。      *      * 如果返回现有的SparkSession,则该构建器中指定的配置选项将应用于现有的SparkSession。     *     * @since 2.0.0     */    def getOrCreate(): SparkSession = synchronized {      // Get the session from current thread's active session. 直接获取该线程对应的SparkSession      var session = activeThreadSession.get()      // 如果SparkSecssion不为空,或者 SparkSession没有停止 ,那么就直接返回当前的SparkSession      if ((session ne null) && !session.sparkContext.isStopped) {        options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }        if (options.nonEmpty) {          logWarning("Using an existing SparkSession; some configuration may not take effect.")        }        return session      }      // Global synchronization so we will only set the default session once. 全局同步,因此我们只设置一次默认会话。      SparkSession.synchronized {        // If the current thread does not have an active session, get it from the global session.        // 如果当前线程不存在一个活动的SparkSession,那么就从全局获取一个        session = defaultSession.get()        if ((session ne null) && !session.sparkContext.isStopped) {          options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }          if (options.nonEmpty) {            logWarning("Using an existing SparkSession; some configuration may not take effect.")          }          return session        }        // No active nor global default session. Create a new one.        // 没有活动的和全局的默认的SparkSession,那么就创建一个        val sparkContext = userSuppliedContext.getOrElse {          // set app name if not given          val randomAppName = java.util.UUID.randomUUID().toString          // 初始化Spark的配置类          val sparkConf = new SparkConf()          options.foreach { case (k, v) => sparkConf.set(k, v) }          // 如果Spark没有指定名称,那么我们就默认一个随机的          if (!sparkConf.contains("spark.app.name")) {            sparkConf.setAppName(randomAppName)          }          /** 直接调用 SparkContext的伴生类创建sc */          val sc = SparkContext.getOrCreate(sparkConf)          // maybe this is an existing SparkContext, update its SparkConf which maybe used          // by SparkSession          // 也许这是一个现有的SparkContext,更新它的SparkConf,可能是SparkSession使用的          options.foreach { case (k, v) => sc.conf.set(k, v) }          if (!sc.conf.contains("spark.app.name")) {            sc.conf.setAppName(randomAppName)          }          sc        }        // Initialize extensions if the user has defined a configurator class.        // 如果用户定义了配置器类,则初始化扩展。        val extensionConfOption = sparkContext.conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)        if (extensionConfOption.isDefined) {          val extensionConfClassName = extensionConfOption.get          try {            val extensionConfClass = Utils.classForName(extensionConfClassName)            val extensionConf = extensionConfClass.newInstance()              .asInstanceOf[SparkSessionExtensions => Unit]            extensionConf(extensions)          } catch {            // Ignore the error if we cannot find the class or when the class has the wrong type.            case e @ (_: ClassCastException |                      _: ClassNotFoundException |                      _: NoClassDefFoundError) =>              logWarning(s"Cannot use $extensionConfClassName to configure session extensions.", e)          }        }        session = new SparkSession(sparkContext, None, None, extensions)        options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }        defaultSession.set(session)        // Register a successfully instantiated context to the singleton. This should be at the        // end of the class definition so that the singleton is updated only if there is no        // exception in the construction of the instance.        // 向singleton注册一个成功的实例化context。这应该在类定义的末尾,只有在实例的构造中没有异常时才更新singleton。        sparkContext.addSparkListener(new SparkListener {          override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = {            defaultSession.set(null)            sqlListener.set(null)          }        })      }      return session    }  }
原创粉丝点击