spark_submit note

来源：互联网发布：不可思议知乎编辑：程序博客网时间：2024/06/18 09:51

executor-memory / num-executors / executor-cores 方面

my submit script
- spark submit task on yarn

[xxx@xxx xxx]$ cat tmpsh.sh  /xxx/spark-2.0.2-bin-hadoop2.6/bin/spark-submit \ --class xxx.ChildrenLockUserSize \ --master yarn \ --deploy-mode cluster \ --driver-memory 19g \ --executor-memory 17g \ --num-executors 55 \ --executor-cores 4 \ --conf spark.driver.maxResultSize=10g \/home/xxx-1.0-SNAPSHOT.jar

spark submit on local

[xxx@xxx]$ cat tmplocalsh.sh  /xxx/spark-2.0.2-bin-hadoop2.6/bin/spark-submit \ --class xxx.perGeneBank.TvTagClassify \ --conf spark.driver.maxResultSize=25g \   /xxx/xx-1.0-SNAPSHOT.jar \

官网内容
http://spark.apache.org/docs/latest/submitting-applications.html
Both sbt and Maven have assembly plugins.
Some of the commonly used options are:

--class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)--master: The master URL for the cluster--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client) †--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.application-arguments: Arguments passed to the main method of your main class, if any

spark-submit 和 spark-submit –master local 效果是一样的

官网提交任务脚本：

Run on a YARN clusterexport HADOOP_CONF_DIR=XXX./bin/spark-submit \  --class org.apache.spark.examples.SparkPi \  --master yarn \  --deploy-mode cluster \  # can be client for client mode  --executor-memory 20G \  --num-executors 50 \  /path/to/examples.jar \  1000

总核数 = 物理CPU个数 X 每颗物理CPU的核数
总逻辑CPU数 = 物理CPU个数 X 每颗物理CPU的核数 X 超线程数

查看物理CPU个数
cat /proc/cpuinfo| grep “physical id”| sort| uniq| wc -l

查看每个物理CPU中core的个数(即核数)
cat /proc/cpuinfo| grep “cpu cores”| uniq

查看逻辑CPU的个数
cat /proc/cpuinfo| grep “processor”| wc -l

验证
VCores：单节点逻辑cpu的核数

cat /proc/cpuinfo | grep "processor"| wc -l32

　　　单节点内存大小

free -g              total        used        free      shared  buff/cache   availableMem:            188         106          10           0          72          80Swap:             3           1           2

VCores Total : 196 = 28 * 7

–executor-cores 4
默认为５(不超过５)，设为４的情况下：每个执行器分配到四个虚拟核（逻辑CPU）

每个节点分配到的container的个数为　28/4 = 7 executor

每个executor的内存为：扣除节点系统等服务使用内存
175 / 7 = 25

spark.yarn.executor.memoryOverhead
max(executorMemory * 0.1, 384m)

25 / 1.1 = 22

这样7个节点分配到的 num-executors 就是 7 * 7 = 49

VCores Used : 49(task view)　有出入，待验证

阅读全文

0 0