spark学习-44-Spark的测量系统MetricsSystem

来源：互联网发布：linux下gdb调试输出编辑：程序博客网时间：2024/05/29 13:59

1。 MetricsSystem介绍

监控是一个大系统完成后最重要的一部分。MetricsSystem 比较好理解，一般是为了衡量系统的各种指标的度量系统。算是一个key-value形态的东西。举个比较简单的例子，我怎么把当前JVM相关信息展示出去呢？做法自然很多，通过MetricsSystem就可以做的更标准化些，具体方式如下：

Source 。数据来源。比如对应的有org.apache.spark.metrics.source.JvmSource
Sink。数据发送到哪去。有被动和主动。一般主动的是通过定时器来完成输出，譬如CSVSink，被动的如MetricsServlet等需要被用户主动调用。
桥接Source 和Sink的则是MetricRegistry了。

Spark 并没有实现底层Metrics的功能，而是使用了一个第三方库：http://metrics.codahale.com 。感兴趣大家可以看看，有个更完整的认识。

3。如何配置MetricsSystem

MetricsSystem的配置有两种，第一种是 metrics.properties 配置文件的形态。第二种是通过spark conf完成，参数以spark.metrics.conf.开头。

我这里简单介绍下第二种方式。

比如我想查看JVM的信息，包括GC和Memory的使用情况，则我通过类似

conf.set("spark.metrics.conf.driver.source.jvm.class","org.apache.spark.metrics.source.JvmSource")

默认情况下，MetricsSystem 配置了一个全局的Sink,MetricsServlet。所以你添加的任何Source 都可以通过一个path /metrics/json获取到。
如果你的程序设置做了上面的设置，把你的spark-ui的路径换成/metrics/json，就能看到jvm源的一些信息了。

通常，如果你要实现一个自定义的Source,可以遵循如下步骤(这里以JvmSource为例)。

– 创建一个Source

private[spark] class JvmSource extends Source {  override val sourceName = "jvm"  override val metricRegistry = new MetricRegistry()  metricRegistry.registerAll(new GarbageCollectorMetricSet)  metricRegistry.registerAll(new MemoryUsageGaugeSet)}

其中 sourceName 是为了给配置用的，比如上面我们设置

spark.metrics.conf.driver.source.jvm.class

里面的jvm 就是JvmSource里设置的sourceName

每个Source 一般会自己构建一个MetricRegistry。上面的例子，具体的数据收集工作是由GarbageCollectorMetricSet,MemoryUsageGaugeSet完成的。

具体就是写一个类继承com.codahale.metrics.MetricSet,然后实现Map

conf.set("spark.metrics.conf.driver.source.jvm.class","org.apache.spark.metrics.source.JvmSource")

– 调用结果

将Spark UI 的地址换成/metrics/json，就能看到输出结果了。当然，这里是因为默认系统默认提供了一个Sink实现:org.apache.spark.metrics.sink.MetricsServlet，你可以自己实现一个。

4。MetricsSystem的初始化是在SparkEnv初始化的时候初始化

// =======================创建测量系统MetricsSystem====================================================    /**        createMetricsSystem方法主要调用了new MetricsSystem(instance, conf, securityMgr)方法        Instance:制定了谁在使用测量系统        这里val isDriver = executorId == SparkContext.DRIVER_IDENTIFIER 而SparkContext.DRIVER_IDENTIFIER的值是driver      如果executorId也是driver,那么isDriver就为真，创建的是driver的监测系统，否则就是创建executor的监测系统     */    val metricsSystem = if (isDriver) {      // Don't start metrics system right now for Driver.      // We need to wait for the task scheduler to give us an app ID.      // Then we can start the metrics system.      // 现在不要为Driver启动metrics system，我们需要task scheduler任务调度器给我们一个APP id，      // 在这之后再启动 metrics system.      MetricsSystem.createMetricsSystem("driver", conf, securityManager)    } else {      // We need to set the executor ID before the MetricsSystem is created because sources and      // sinks specified in the metrics configuration file will want to incorporate this executor's      // ID into the metrics they report.      // 我们需要设置的executor 的ID在创建MetricsSystem之前，因为sources和sinks的标准配置文件指定了      // 将要把这个 executor 的ID 传递到metrics的报告中。      conf.set("spark.executor.id", executorId)      val ms = MetricsSystem.createMetricsSystem("executor", conf, securityManager)      ms.start()      ms    }

该方法有调用

 def createMetricsSystem(      instance: String, conf: SparkConf, securityMgr: SecurityManager): MetricsSystem = {    new MetricsSystem(instance, conf, securityMgr)  }

然后我们来看看代码

package org.apache.spark.metricsimport java.util.Propertiesimport java.util.concurrent.TimeUnitimport scala.collection.mutableimport com.codahale.metrics.{Metric, MetricFilter, MetricRegistry}import org.eclipse.jetty.servlet.ServletContextHandlerimport org.apache.spark.{SecurityManager, SparkConf}import org.apache.spark.internal.config._import org.apache.spark.internal.Loggingimport org.apache.spark.metrics.sink.{MetricsServlet, Sink}import org.apache.spark.metrics.source.{Source, StaticSources}import org.apache.spark.util.Utils/** * Spark Metrics System, created by a specific "instance", combined by source, * sink, periodically polls source metrics data to sink destinations. * 创建被指定的实例  *   spark的测量实例，指定一个"instance"实例去创建它，结合source，sink,periodically polls source metrics data to sink destinations.  * * "instance" specifies "who" (the role) uses the metrics system. In Spark, there are several roles * like master, worker, executor, client driver. These roles will create metrics system * for monitoring. So, "instance" represents these roles. Currently in Spark, several instances * have already implemented: master, worker, executor, driver, applications. *  * 实例："instance"代表的是谁在使用这个监测系统。在Spark中有以下几种实例。比如：master, worker, executor, client driver  * 这些实例都会创建检测系统，为了监测各自的情况。所以，实例指的就是这些。在目前的spark版本中已经实现了几个实例的监控，分别是  * master, worker, executor, driver, applications.  *  * * "source" specifies "where" (source) to collect metrics data from. In metrics system, there exists * two kinds of source: *   1. Spark internal source, like MasterSource, WorkerSource, etc, which will collect *   Spark component's internal state, these sources are related to instance and will be *   added after a specific metrics system is created. *   2. Common source, like JvmSource, which will collect low level state, is configured by *   configuration and loaded through reflection. *  * 数据来源  *   "source"值得是从哪里去收集这些被监控的数据。在监测系统中，已经存在了以下几种类别的数据来源：  *   1.spark内在的数据来源，比如MasterSource, WorkerSource,等，他们会收集一些spark组件内在的状态，这些实例在监测系统创建的时候  *     会自动关联。  *   2.普通的数据来源，比如JvmSource，它将收集低级别状态，配置为配置，并通过反射加载。  *  * * "sink" specifies "where" (destination) to output metrics data to. Several sinks can * coexist and metrics can be flushed to all these sinks.  *  * 要把数据输出到哪里的sink：  *   多个接收器可以共存，而指标可以被刷新到所有这些接收器。  * * * Metrics configuration format is like below: * [instance].[sink|source].[name].[options] = xxxx * * [instance] can be "master", "worker", "executor", "driver", "applications" which means only * the specified instance has this property. * wild card "*" can be used to replace instance name, which means all the instances will have * this property. * * [sink|source] means this property belongs to source or sink. This field can only be * source or sink. * * [name] specify the name of sink or source, if it is custom defined. * * [options] represent the specific property of this source or sink.  *  *  * 监测系统的配置格式如下：  * *//**  * csdn博客：http://blog.csdn.net/allwefantasy/article/details/50449464  * MetricsSystem 比较好理解，一般是为了衡量系统的各种指标的度量系统。算是一个key-value形态的东西。  * 举个比较简单的例子，我怎么把当前JVM相关信息展示出去呢？做法自然很多，通过MetricsSystem就可以做的更标准化些  *  * MetricesSystem使用codahale提供的第三方测量仓库Metrics，它有3个概念：  *   Instance:制定了谁在使用测量系统  *   source:制定了从哪里手机测量数据  *   Sink:指定从哪里输出测量数据  *  * Spark按照Instance的不同，区分为Master,Worker,Application,Driver和Executor.  * Spark目前提供的Sink有ConsoleSink，csvSink,JmxSink,MetricsServlet,GraphiteSink等。  * Spark中使用的MetricsServlet作为默认的Sink  *  */private[spark] class MetricsSystem private (    val instance: String,    conf: SparkConf,    securityMgr: SecurityManager)  extends Logging {  // 主要是加载监测系统的属性  private[this] val metricsConfig = new MetricsConfig(conf)  private val sinks = new mutable.ArrayBuffer[Sink]  private val sources = new mutable.ArrayBuffer[Source]  private val registry = new MetricRegistry()  private var running: Boolean = false  // Treat MetricsServlet as a special sink as it should be exposed to add handlers to web ui  private var metricsServlet: Option[MetricsServlet] = None  /**   * Get any UI handlers used by this metrics system; can only be called after start().    * 得到任何UI handlers，但是只能在start()方法调用后调用    *    * 为了能够在SparkUI（网页）访问到测量数据，所以需要给Sinks增加Jetty的servletContextHandler,这里    * 主要用到了MetricsSystem的GetServletHandlers。   */  def getServletHandlers: Array[ServletContextHandler] = {    require(running, "Can only call getServletHandlers on a running MetricsSystem")    metricsServlet.map(_.getHandlers(conf)).getOrElse(Array())  }  // 主要是加载监测系统的属性，执行初始化方法  metricsConfig.initialize()  //启动MetricsSystem貌似只能启动一次，否则会出问题 该方法在sparkContext中被_env.metricsSystem.start()这样调用  def start() {    //require() 方法用在对参数的检验上，不通过则抛出 IllegalArgumentException，第一次为通过    require(!running, "Attempting to start a MetricsSystem that is already running")    // 立即设置这个实例的监控为正在运行，下次再启动这个实例的监控就会报错 Attempting to start a MetricsSystem that is already running    running = true    // 这句话的意思是得到所有的静态资源遍历它，并且为每个资源执行registerSource方法（该方法用于注册资源）    StaticSources.allSources.foreach(registerSource)    // 上面一句是把杂碎的source注册到Sources中，存储在一个mutable.ArrayBuffer[Source]中，这里是把sources。。。。。    registerSources()    // 注册Sinks    registerSinks()    // 这一点不知道执行的是哪个start方法    sinks.foreach(_.start)  }  // 停止当前运行的实例对应的监测系统  def stop() {    if (running) {      sinks.foreach(_.stop)    } else {      logWarning("Stopping a MetricsSystem that is not running")    }    running = false  }  def report() {    sinks.foreach(_.report())  }  /**   * Build a name that uniquely identifies each metric source.   * The name is structured as follows: <app ID>.<executor ID (or "driver")>.<source name>.   * If either ID is not available, this defaults to just using <source name>.   *    * 构建唯一标识每个度量源的名称。名称的结构如下:    *     <app ID>.<executor ID (or "driver")>.<source name>.    * 如果两个ID都得不到，那么格式如下：    *     <source name>   * @param source Metric source to be named by this method.   * @return An unique metric name for each combination of   *         application, executor/driver and metric source.   */  private[spark] def buildRegistryName(source: Source): String = {    val metricsNamespace = conf.get(METRICS_NAMESPACE).orElse(conf.getOption("spark.app.id"))    val executorId = conf.getOption("spark.executor.id")    val defaultName = MetricRegistry.name(source.sourceName)    if (instance == "driver" || instance == "executor") {      if (metricsNamespace.isDefined && executorId.isDefined) {        MetricRegistry.name(metricsNamespace.get, executorId.get, source.sourceName)      } else {        // Only Driver and Executor set spark.app.id and spark.executor.id.        // Other instance types, e.g. Master and Worker, are not related to a specific application.        if (metricsNamespace.isEmpty) {          logWarning(s"Using default name $defaultName for source because neither " +            s"${METRICS_NAMESPACE.key} nor spark.app.id is set.")        }        if (executorId.isEmpty) {          logWarning(s"Using default name $defaultName for source because spark.executor.id is " +            s"not set.")        }        defaultName      }    } else { defaultName }  }  def getSourcesByName(sourceName: String): Seq[Source] =    sources.filter(_.sourceName == sourceName)  /** 注册资源    * 创建完ExecutorSource后，调用MetricsSysytem的registerSource方法将ExecutorSource注册到MetricsSystem.    * registerSource方法使用MetricRegistry的register方法，将Source注册到MetricRegistry.    */  def registerSource(source: Source) {    sources += source    try {      val regName = buildRegistryName(source)      registry.register(regName, source.metricRegistry)    } catch {      case e: IllegalArgumentException => logInfo("Metrics already registered", e)    }  }  def removeSource(source: Source) {    sources -= source    val regName = buildRegistryName(source)    registry.removeMatching(new MetricFilter {      def matches(name: String, metric: Metric): Boolean = name.startsWith(regName)    })  }  /**    registerSources方法用于注册Sources,高数测量系统从哪里收集测量数据。    1.从metricsConfig获取Driver的Properties,默认为创建MetricsSystem的过程中解析的      {sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet,sink.servlet.path=/metrics/json}    2.用正则表达式匹配Driver的Properties中以source.开头的属性，然后将属性中的Source反射得到实例      加入ArrayBuffer[Source].    3.将每个source的metricRegistry(也是MetricSet的子类型)注册到ConcurrentMap<String,Metric>metrics.    */  private def registerSources() {    // 得到一个实例的配置   比如driver的配置（或者executor的配置）    val instConfig = metricsConfig.getInstance(instance)    // 得到这个实例的资源配置    val sourceConfigs = metricsConfig.subProperties(instConfig, MetricsSystem.SOURCE_REGEX)    // Register all the sources related to instance    // 注册所有与实例相关的资源    sourceConfigs.foreach { kv =>      val classPath = kv._2.getProperty("class")      try {        val source = Utils.classForName(classPath).newInstance()        registerSource(source.asInstanceOf[Source])      } catch {        case e: Exception => logError("Source class " + classPath + " cannot be instantiated", e)      }    }  }  /**    registerSinks方法用于注册Sinks，既高数测量系统MetricsSystem往哪输出测量数据。    1.从Driver的Properties中用正则匹配以sink.开头的属性，如{sink.servlet.class=org.apache.      spark.metrics.sink.MetricsServlet，sink.servlet.path=/metrics/json},将其转换为      Map(servlet->{class=org.apache.spark.metrics.sink.MetricsServlet,path=/metrics/json})    2.将子属性class对应的类metricsServlet反射得到MetricsServlet实例，如果属性的key是Servlet,      将其设置为metricsServlet；如果是Sink,则假如到ArrayBuffer[sink]中。    */  private def registerSinks() {    val instConfig = metricsConfig.getInstance(instance)    val sinkConfigs = metricsConfig.subProperties(instConfig, MetricsSystem.SINK_REGEX)    sinkConfigs.foreach { kv =>      val classPath = kv._2.getProperty("class")      if (null != classPath) {        try {          val sink = Utils.classForName(classPath)            .getConstructor(classOf[Properties], classOf[MetricRegistry], classOf[SecurityManager])            .newInstance(kv._2, registry, securityMgr)          if (kv._1 == "servlet") {            metricsServlet = Some(sink.asInstanceOf[MetricsServlet])          } else {            sinks += sink.asInstanceOf[Sink]          }        } catch {          case e: Exception =>            logError("Sink class " + classPath + " cannot be instantiated")            throw e        }      }    }  }}private[spark] object MetricsSystem {  val SINK_REGEX = "^sink\\.(.+)\\.(.+)".r  val SOURCE_REGEX = "^source\\.(.+)\\.(.+)".r  private[this] val MINIMAL_POLL_UNIT = TimeUnit.SECONDS  private[this] val MINIMAL_POLL_PERIOD = 1  def checkMinimalPollingPeriod(pollUnit: TimeUnit, pollPeriod: Int) {    val period = MINIMAL_POLL_UNIT.convert(pollPeriod, pollUnit)    if (period < MINIMAL_POLL_PERIOD) {      throw new IllegalArgumentException("Polling period " + pollPeriod + " " + pollUnit +        " below than minimal polling period ")    }  }  def createMetricsSystem(      instance: String, conf: SparkConf, securityMgr: SecurityManager): MetricsSystem = {    new MetricsSystem(instance, conf, securityMgr)  }}

阅读全文

0 0