spark 2.1 on yarn -- container shell analysis

来源：互联网发布：手机报价软件编辑：程序博客网时间：2024/05/16 16:04

I set the following content in spark-defaults.conf

spark.serializer                 org.apache.spark.serializer.KryoSerializer spark.master                   yarnspark.executor.instances  2spark.executor.cores      1spark.executor.memory 512m

When execute spark-shell, it will create two executors.

jps32412 CoarseGrainedExecutorBackend32444 CoarseGrainedExecutorBackend

look at the command of one executor.

]$ ps aux | grep 32412houzhiz+   374  0.0  0.0 112668   976 pts/1    R+   14:08   0:00 grep --color=auto 32412houzhiz+ 32412 15.1  4.3 2371448 342156 ?      Sl   14:03   0:46 /usr/local/java/bin/java -server -Xmx512m -Djava.io.tmpdir=/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002/tmp -Dspark.driver.port=35736 -Dspark.yarn.app.container.log.dir=/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002 -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.122.1:35736 --executor-id 1 --hostname localhost --cores 1 --app-id application_1495532285542_0005 --user-class-path file:/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002/__app__.jar

Look the container directory.

$ cd /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002[houzhizhen@localhost container_1495532285542_0005_01_000002]$ ll总用量 20-rw-rw-r--. 1 houzhizhen houzhizhen   86 5月  24 14:03 container_tokens-rwx------. 1 houzhizhen houzhizhen  703 5月  24 14:03 default_container_executor_session.sh-rwx------. 1 houzhizhen houzhizhen  757 5月  24 14:03 default_container_executor.sh-rwx------. 1 houzhizhen houzhizhen 3590 5月  24 14:03 launch_container.shlrwxrwxrwx. 1 houzhizhen houzhizhen   89 5月  24 14:03 __spark_conf__ -> /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/17/__spark_conf__.ziplrwxrwxrwx. 1 houzhizhen houzhizhen  108 5月  24 14:03 __spark_libs__ -> /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/16/__spark_libs__7172508084572895679.zipdrwx--x---. 2 houzhizhen houzhizhen    6 5月  24 14:03 tmp[houzhizhen@localhost container_1495532285542_0005_01_000002]$

Open the spark configuration, you can see spark.executor.id=driver, and from __spark_conf__ -> /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/17/__spark_conf__.zip, so it can safely conclude that the configure file is shared across executors of the same spark application.

 cat __spark_conf__/__spark_conf__.properties #Spark configuration.#Wed May 24 14:03:27 CST 2017spark.yarn.cache.visibilities=PRIVATEspark.yarn.cache.timestamps=1495605805866spark.executor.memory=512mspark.executor.id=driverspark.driver.host=192.168.122.1spark.yarn.cache.confArchive=hdfs\://localhost\:8020/user/houzhizhen/.sparkStaging/application_1495532285542_0005/__spark_conf__.zipspark.files.ignoreCorruptFiles=truespark.yarn.cache.sizes=200756074spark.jars=spark.sql.catalogImplementation=hivespark.home=/usr/local/sparkspark.submit.deployMode=clientspark.executor.heartbeatInterval=2spark.master=yarnspark.yarn.cache.filenames=hdfs\://localhost\:8020/user/houzhizhen/.sparkStaging/application_1495532285542_0005/__spark_libs__7172508084572895679.zip\#__spark_libs__spark.executor.cores=1spark.yarn.cache.types=ARCHIVEspark.driver.appUIAddress=http\://192.168.122.1\:4040spark.serializer=org.apache.spark.serializer.KryoSerializerspark.repl.class.outputDir=/tmp/spark-caaf86f0-267d-4b39-9bfe-833d97db838e/repl-e03f92dd-176d-42b5-9ebd-a1e3d66c7e1cspark.executor.instances=2spark.app.name=Spark shellspark.repl.class.uri=spark\://192.168.122.1\:35736/classesspark.driver.port=35736

Open launch_container.sh, you can see $PWD/__spark_conf__:$PWD/__spark_libs__/* is included in the CLASSPATH. From the last command, it can see the executor-id is override with --executor-id 1

launch_container.sh

cat launch_container.sh #!/bin/bashexport SPARK_YARN_STAGING_DIR="hdfs://localhost:8020/user/houzhizhen/.sparkStaging/application_1495532285542_0005"export HADOOP_CONF_DIR="/usr/local/hadoop/etc/hadoop"export JAVA_HOME="/usr/local/java"export SPARK_LOG_URL_STDOUT="http://localhost:8042/node/containerlogs/container_1495532285542_0005_01_000002/houzhizhen/stdout?start=-4096"export NM_HOST="localhost"export SPARK_HOME="/usr/local/spark"export HADOOP_HDFS_HOME="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2"export LOGNAME="houzhizhen"export JVM_PID="$$"export PWD="/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002"export HADOOP_COMMON_HOME="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2"export LOCAL_DIRS="/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005"export NM_HTTP_PORT="8042"export LOG_DIRS="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002"export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="export NM_PORT="33996"export USER="houzhizhen"export HADOOP_YARN_HOME="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2"export CLASSPATH="$PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*"export SPARK_YARN_MODE="true"export HADOOP_TOKEN_FILE_LOCATION="/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002/container_tokens"export SPARK_USER="houzhizhen"export SPARK_LOG_URL_STDERR="http://localhost:8042/node/containerlogs/container_1495532285542_0005_01_000002/houzhizhen/stderr?start=-4096"export HOME="/home/"export CONTAINER_ID="container_1495532285542_0005_01_000002"export MALLOC_ARENA_MAX="4"ln -sf "/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/17/__spark_conf__.zip" "__spark_conf__"hadoop_shell_errorcode=$?if [ $hadoop_shell_errorcode -ne 0 ]then  exit $hadoop_shell_errorcodefiln -sf "/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/16/__spark_libs__7172508084572895679.zip" "__spark_libs__"hadoop_shell_errorcode=$?if [ $hadoop_shell_errorcode -ne 0 ]then  exit $hadoop_shell_errorcodefiexec /bin/bash -c "$JAVA_HOME/bin/java -server -Xmx512m -Djava.io.tmpdir=$PWD/tmp '-Dspark.driver.port=35736' -Dspark.yarn.app.container.log.dir=/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.122.1:35736 --executor-id 1 --hostname localhost --cores 1 --app-id application_1495532285542_0005 --user-class-path file:$PWD/__app__.jar 1>/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002/stdout 2>/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002/stderr"hadoop_shell_errorcode=$?if [ $hadoop_shell_errorcode -ne 0 ]then  exit $hadoop_shell_errorcodefi

阅读全文

0 0