how-to-configure-and-use-spark-history-server

来源:互联网 发布:淘宝网舞蹈鞋 编辑:程序博客网 时间:2024/05/18 01:19

how-to-configure-and-use-spark-history-server


参考

spark configuration

spark monitoring Viewing After the Fact


基础知识

  • how to configure spark ?
    locations to configure spark

    • spark properties using val sparkConf=new SparkConf().set… in spark application code

      • Dynamically Loading Spark Properties using spark-submit options
      ./bin/spark-submit --name "My app" --master local[4] --conf spark.shuffle.spill=false --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar
      • Viewing Spark Properties using spark app WebUI http://<driver>:4040 in the “Environment” tab
    • environment variable using conf/spark-env.sh
    • loging using conf/log4j.properites
  • Available Properties about history server

Property Name Default Meaning spark.eventLog.compress false Whether to compress logged events, if spark.eventLog.enabled is true. spark.eventLog.dir file:///tmp/spark-events Base directory in which Spark events are logged, if spark.eventLog.enabled is true. Within this base directory, Spark creates a sub-directory for each application, and logs the events specific to the application in this directory. Users may want to set this to a unified location like an HDFS directory so history files can be read by the history server. spark.eventLog.enabled false Whether to log Spark events, useful for reconstructing the Web UI after the application has finished.

* Environment Variables


enable spark eventlog

  • method 1:

    bin/spark-submit --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=file:/data01/data_tmp/spark-events ...
  • method 2:
    vi conf/spark-env.sh

    SPARK_DAEMON_JAVA_OPTS=SPARK_MASTER_OPTS=SPARK_WORKER_OPTS=SPARK_JAVA_OPTS="-Dspark.eventLog.enabled=true -Dspark.eventLog.dir=file:/data01/data_tmp/spark-events"

    NOTE:

    • a spark.eventLog.dir=file:/data01/data_tmp/spark-events

      • eventlog 是由 driver 写日志,对于 local/standalone/yarn-client 是没有大问题,但对于 spark-cluster 需要特别注意,最好设置为 hdfs 路径
      • spark.eventLog.dir 指定目录不会自动生成,需要手工创建,有相应权限
    • b these Environment variable will be used in bin/spark-class)

test spark app after enable eventlog

test case:

object WordCount extends App {  val sparkConf = new SparkConf().setAppName("WordCount")  val sc = new SparkContext(sparkConf)  val lines = sc.textFile("file:/data01/data/datadir_github/spark/README.md")  val words = lines.flatMap(_.split("\\s+"))  val wordsCount = words.map(word=>(word, 1)).reduceByKey(_ + _)  wordsCount.foreach(println)  sc.stop()}
  • 测试问题1:IDEA中 running spark in local 模式没有生成 eventlog ,继续测试自带 的 examples.SparkPi 一样

    • 处理方法1: 在 CLI 测试

      bin/run-example SparkPi

      结果报错:

      Exception in thread “main” java.lang.IllegalArgumentException: Log directory file:/data01/data_tmp/spark-events does not exist.

    • 处理方法2:

      mkdir -p /data01/data_tmp/spark-eventsbin/run-example SparkPi

      结果:测试确认 /data01/data_tmp/spark-events 下生成了eventlog

      继续在 IDEA中测试 running spark in local 发现依然没有生成 eventlog
      原因分析:在 IDEA 中测试,虽然在依赖添加了 SPARK_CONF_DIR 路径,但 IDEA中执行并不像 在 CLI 使用 bin/spark-submit 提交app 读取解析 conf/spark-env.sh 中的配置文件

    • 处理方法3:
      在 IDEA 的 run configuration 设置 vm options -Dspark.master="local[2]" -Dspark.eventLog.enabled=true -Dspark.eventLog.dir=file:/data01/data_tmp/spark-events
      结果: local 模式正常

  • 问题2:IDEA 测试 spark-on-yarn报错(暂没有解决)
    在 IDEA 的 run configuration 设置 vm options -Dspark.master="yarn-client" -Dspark.eventLog.enabled=true -Dspark.eventLog.dir=file:/data01/data_tmp/spark-events 报错


configure, start and use spark history server

  • configure
    vi conf/spark-env.sh

    SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=file:/data01/data_tmp/spark-events" #set when you use spark-history-server

    NOTE

    • spark.history.fs.logDirectory , spark.eventLog.dir 可以不同,意味着能够移动 eventlog 文件,便于协助诊断
    • spark.history.ui.port
  • start

    bin/start-history-server.sh

access spark history WebUI at http://<server-url>:18080

spark history server WebUI applications
spark history server WebUI applications

spark history server WebUI specific app
spark history server WebUI specific app

0 0
原创粉丝点击