spark-Spark Configuration
来源:互联网 发布:nginx优化10万并发 编辑:程序博客网 时间:2024/06/03 20:46
原文:spark configuration
Spark Properties
- sparkconf
bin/spark-submit
- conf/spark-defaults.conf文件
spark-submit
or spark-shell>spark-defaults.conf
file.最终的参数为3者的merge
Spark properties 为不同的引用配置不同的参数,例如:本地模式2个线程
val conf = new SparkConf() .setMaster("local[2]") .setAppName("CountingSheep")val sc = new SparkContext(conf)
对于时间和字节参数要添加单位,如
25ms (milliseconds)5s (seconds)10m or 10min (minutes)3h (hours)5d (days)1y (years)
1b (bytes)1k or 1kb (kibibytes = 1024 bytes)1m or 1mb (mebibytes = 1024 kibibytes)1g or 1gb (gibibytes = 1024 mebibytes)1t or 1tb (tebibytes = 1024 gibibytes)1p or 1pb (pebibytes = 1024 tebibytes)
Dynamically Loading Spark Properties
可以创建空conf
val sc = new SparkContext(new SparkConf())
在运行是指定参数
./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar
bin/spark-submit
可以直接读取conf/spark-defaults.conf文件,每一行为一个key和value
spark.master spark://5.6.7.8:7077spark.executor.memory 4gspark.eventLog.enabled truespark.serializer org.apache.spark.serializer.KryoSerializer
Viewing Spark Properties
在web UI at http://<driver>:4040
的“Environment” tab中具体校对提交的参数是否和自己本意一致
Available Properties
Most of the properties that control internal settings have reasonable default values. Some of the most common options to set are:
Application Properties
- spark.driver.maxResultSize:限定Spark action (e.g. collect)到driver的序列化大小,超过Jobs 将aborted
- spark.memory.fraction:0.6 Fraction of (heap space - 300MB) used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records. Leaving this at the default value is recommended. For more detail, including important information about correctly tuning JVM garbage collection when increasing this value, see this description.
Inheriting Hadoop Cluster Configuration
If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that should be included on Spark’s classpath:
hdfs-site.xml
, which provides default behaviors for the HDFS client.core-site.xml
, which sets the default filesystem name.
The location of these configuration files varies across CDH and HDP versions, but a common location is inside of /etc/hadoop/conf
. Some tools, such as Cloudera Manager, create configurations on-the-fly, but offer a mechanisms to download copies of them.
To make these files visible to Spark, set HADOOP_CONF_DIR
in $SPARK_HOME/spark-env.sh
to a location containing the configuration files.
- Spark Configuration
- spark-Spark Configuration
- Spark Configuration(Spark配置)
- Spark Configuration(Spark配置)
- Spark Streaming Logging Configuration
- spark configuration and monitor
- Spark Streaming Logging Configuration
- Spark官方文档: Spark Configuration(Spark配置)
- Spark官方文档: Spark Configuration(Spark配置)
- spark
- spark
- Spark
- spark
- Spark
- spark
- Spark
- Spark
- spark
- 1、Java根据日期生成编号
- C++设计模式——抽象工厂模式
- 四年Java 一个java程序员的年终总结
- 关于多线程产生的各种lock
- 运算符的优先级和结合性
- spark-Spark Configuration
- CSS元素水平居中常用的几种方法
- 活动安排问题的 动态规划和贪心算法
- 准确率 召回率与数据归一化
- sqlmap的tamper问题
- 网络编程day7socket编程
- 通过获取IP地址熟悉popen、str*系列、snprintf()、fgets
- MessageBox
- Spring 声明式事务管理