Spark

来源:互联网 发布:mac的邮箱怎么设置 编辑:程序博客网 时间:2024/06/07 00:39

Spark Configuration 官方文档

Spark Configuration 中文文档


系统配置:


  1. Spark属性:控制大部分的应用程序参数,可以用SparkConf对象或者Java系统属性设置
  2. 环境变量:可以通过每个节点的conf/spark-env.sh脚本设置。例如IP地址、端口等信息
  3. 日志配置:可以通过log4j.properties配置

1. Spark 属性

These properties can be set directly on a SparkConf passed to your SparkContext. SparkConf allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through the set() method. For example, we could initialize an application with two threads as follows:
Note that we run with local[2], meaning two threads - which represents “minimal” parallelism, which can help detect bugs that only exist when we run in a distributed context.

val conf = new SparkConf()             .setMaster("local[2]")             .setAppName("CountingSheep")val sc = new SparkContext(conf)

bin/spark-submit will also read configuration options from conf/spark-defaults.conf, in which each line consists of a key and a value separated by whitespace. For example:

spark.master            spark://5.6.7.8:7077spark.executor.memory   4gspark.eventLog.enabled  truespark.serializer        org.apache.spark.serializer.KryoSerializer

优先级:

SparkConf > CLI > spark-defaults.conf

   cat spark-env.sh JAVA_HOME=/data/jdk1.8.0_111 SCALA_HOME=/data/scala-2.11.8 SPARK_MASTER_IP=192.168.1.10 HADOOP_CONF_DIR=/data/hadoop-2.6.5/etc/hadoop SPARK_LOCAL_DIRS=/data/spark-1.6.3-bin-hadoop2.6/spark_data SPARK_WORKER_DIR=/data/spark-1.6.3-bin-hadoop2.6/spark_data/spark_works

   cat slavesmasterslave1slave2

   cat spark-defaults.confspark.master spark://master:7077spark.serializer org.apache.spark.serializer.KryoSerializer