浅谈Spark几种不同的任务提交相关脚本（以Spark 1.5.0为例）

来源：互联网发布：linux can驱动编辑：程序博客网时间：2024/05/16 12:53

本节主要内容

spark-shell
spark-submit
spark-sql
spark-class
总结

1. spark-shell

Spark-shell脚本文件内容如下：

#!/usr/bin/env bash## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.### Shell script for starting the Spark Shell REPLcygwin=falsecase "`uname`" in  CYGWIN*) cygwin=true;;esac# Enter posix mode for bashset -o posixexport FWDIR="$(cd "`dirname "$0"`"/..; pwd)"export _SPARK_CMD_USAGE="Usage: ./bin/spark-shell [options]"# SPARK-4161: scala does not assume use of the java classpath,# so we need to add the "-Dscala.usejavacp=true" flag manually. We# do this specifically for the Spark shell because the scala REPL# has its own class loader, and any additional classpath specified# through spark.driver.extraClassPath is not automatically propagated.SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"function main() {  if $cygwin; then    # Workaround for issue involving JLine and Cygwin    # (see http://sourceforge.net/p/jline/bugs/40/).    # If you're using the Mintty terminal emulator in Cygwin, may need to set the    # "Backspace sends ^H" setting in "Keys" section of the Mintty options    # (see https://github.com/sbt/sbt/issues/562).    stty -icanon min 1 -echo > /dev/null 2>&1    export SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Djline.terminal=unix"    #调用spark-submit脚本，传入class org.apache.spark.repl.Main类    #、spark程序名称"Spark shell"及所有传入该脚本的参数    "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"    stty icanon echo > /dev/null 2>&1  else    export SPARK_SUBMIT_OPTS    "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"  fi}# Copy restore-TTY-on-exit functions from Scala script so spark-shell exits properly even in# binary distribution of Spark where Scala is not installedexit_status=127saved_stty=""# restore stty settings (echo in particular)function restoreSttySettings() {  stty $saved_stty  saved_stty=""}function onExit() {  if [[ "$saved_stty" != "" ]]; then    restoreSttySettings  fi  exit $exit_status}# to reenable echo if we are interrupted before completing.trap onExit INT# save terminal settingssaved_stty=$(stty -g 2>/dev/null)# clear on error so we don't later try to restore themif [[ ! $? ]]; then  saved_stty=""fimain "$@"# record the exit status lest it be overwritten:# then reenable echo and propagate the code.exit_status=$?onExit1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94

2. spark-sql

spark-sql脚本内容如下：

#!/usr/bin/env bash## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.#export FWDIR="$(cd "`dirname "$0"`"/..; pwd)"export _SPARK_CMD_USAGE="Usage: ./bin/spark-sql [options] [cli option]"#同样，通过spark-submit脚本提交任务#只不过传入的类是SparkSQLCLIDriverexec "$FWDIR"/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver "$@"1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

3. spark-submit

spark-submit脚本内容如下：

#!/usr/bin/env bash## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.#SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"# disable randomized hash for string in Python 3.3+export PYTHONHASHSEED=0#spark-submit最终调用的是spark-class脚本#传入的类是org.apache.spark.deploy.SparkSubmit#及其它传入的参数exec "$SPARK_HOME"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

4. spark-class

spark-class脚本内容如下：

#!/usr/bin/env bash## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.## Figure out where Spark is installed#定位SAPRK_HOME目录export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"#加载load-spark-env.sh，运行环境相关信息#例如配置文件conf下的spark-env.sh等. "$SPARK_HOME"/bin/load-spark-env.sh# Find the java binary# 定位JAVA_HOME目录if [ -n "${JAVA_HOME}" ]; then  RUNNER="${JAVA_HOME}/bin/java"else  if [ `command -v java` ]; then    RUNNER="java"  else    echo "JAVA_HOME is not set" >&2    exit 1  fifi# Find assembly jar#定位spark-assembly-1.5.0-hadoop2.4.0.jar文件（以spark1.5.0为例）#这意味着任务提交时无需将该JAR文件打包SPARK_ASSEMBLY_JAR=if [ -f "$SPARK_HOME/RELEASE" ]; then  ASSEMBLY_DIR="$SPARK_HOME/lib"else  ASSEMBLY_DIR="$SPARK_HOME/assembly/target/scala-$SPARK_SCALA_VERSION"finum_jars="$(ls -1 "$ASSEMBLY_DIR" | grep "^spark-assembly.*hadoop.*\.jar$" | wc -l)"if [ "$num_jars" -eq "0" -a -z "$SPARK_ASSEMBLY_JAR" ]; then  echo "Failed to find Spark assembly in $ASSEMBLY_DIR." 1>&2  echo "You need to build Spark before running this program." 1>&2  exit 1fiASSEMBLY_JARS="$(ls -1 "$ASSEMBLY_DIR" | grep "^spark-assembly.*hadoop.*\.jar$" || true)"if [ "$num_jars" -gt "1" ]; then  echo "Found multiple Spark assembly jars in $ASSEMBLY_DIR:" 1>&2  echo "$ASSEMBLY_JARS" 1>&2  echo "Please remove all but one jar." 1>&2  exit 1fiSPARK_ASSEMBLY_JAR="${ASSEMBLY_DIR}/${ASSEMBLY_JARS}"LAUNCH_CLASSPATH="$SPARK_ASSEMBLY_JAR"# Add the launcher build dir to the classpath if requested.if [ -n "$SPARK_PREPEND_CLASSES" ]; then  LAUNCH_CLASSPATH="$SPARK_HOME/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"fiexport _SPARK_ASSEMBLY="$SPARK_ASSEMBLY_JAR"# The launcher library will print arguments separated by a NULL character, to allow arguments with# characters that would be otherwise interpreted by the shell. Read that in a while loop, populating# an array that will be used to exec the final command.#执行org.apache.spark.launcher.Main作为Spark应用程序的主入口CMD=()while IFS= read -d '' -r ARG; do  CMD+=("$ARG")done < <("$RUNNER" -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@")exec "${CMD[@]}"1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84

5. 总结

通过上述脚本的源码可以看到 spark-shell、spark-sql实现方式都是通过调用spark-submit脚本来实现的，而spark-submit又是通过spark-class脚本来实现的，spark-class脚本最终执行org.apache.spark.launcher.Main，作为整个Spark程序的主入口

0 0