linux上运行spark程序

来源：互联网发布：海信液晶电视连接网络编辑：程序博客网时间：2024/06/06 06:53

参考官网

spark-submit

在spark安装目录的bin目录下有一个spark-submit脚本，可以用来提交运行spark程序

如果配置了spark的path可以直接使用spark-submit命令

编译构建spark程序

使用sbt 或者maven构建程序生成jar包

spark-submit的使用

spark-submit \  --class <main-class> \  --master <master-url> \  --deploy-mode <deploy-mode> \  --conf <key>=<value> \  ... # other options  <application-jar> \  [application-arguments]

--class: 要运行的jar包里的类，比如 test.spark.examples

--master: master的地址比如 spark://23.195.26.187:7077

--deploy-mode: 部署模式

--conf: 运行时的一些配置 “key=value”类型

application-jar: 要运行的jar包路径，可以是hdfs:// 开头或者 file:// 开头。比如：/root/program/spark/test.jar

application-arguments: 要传给运行类主方法的参数，没有可以不传

例子

# 本地运行，使用8个核心，传入参数100./bin/spark-submit \  --class org.apache.spark.examples.SparkPi \  --master local[8] \  /path/to/examples.jar \  100# Run on a Spark standalone cluster in client deploy mode./bin/spark-submit \  --class org.apache.spark.examples.SparkPi \  --master spark://207.184.161.138:7077 \  --executor-memory 20G \  --total-executor-cores 100 \  /path/to/examples.jar \  1000# Run on a Spark standalone cluster in cluster deploy mode with supervise./bin/spark-submit \  --class org.apache.spark.examples.SparkPi \  --master spark://207.184.161.138:7077 \  --deploy-mode cluster \  --supervise \  --executor-memory 20G \  --total-executor-cores 100 \  /path/to/examples.jar \  1000# Run on a YARN clusterexport HADOOP_CONF_DIR=XXX./bin/spark-submit \  --class org.apache.spark.examples.SparkPi \  --master yarn \  --deploy-mode cluster \  # can be client for client mode  --executor-memory 20G \  --num-executors 50 \  /path/to/examples.jar \  1000# Run a Python application on a Spark standalone cluster./bin/spark-submit \  --master spark://207.184.161.138:7077 \  examples/src/main/python/pi.py \  1000# Run on a Mesos cluster in cluster deploy mode with supervise./bin/spark-submit \  --class org.apache.spark.examples.SparkPi \  --master mesos://207.184.161.138:7077 \  --deploy-mode cluster \  --supervise \  --executor-memory 20G \  --total-executor-cores 100 \  http://path/to/examples.jar \  1000

例子：

程序：

路径：

/root/worspace/test-1.0.jar

命令：

spark-submit --class SparkSQLExample --master local /root/worspace/test-1.0.jar

结果：

部分输出如下

17/10/09 17:58:20 INFO DAGScheduler: ResultStage 9 (show at SparkSQLExample.scala:104) finished in 0.027 s17/10/09 17:58:20 INFO DAGScheduler: Job 7 finished: show at SparkSQLExample.scala:104, took 0.044894 s+--------------------+----+-------+|     _corrupt_record| age|   name|+--------------------+----+-------+|                null|null|Michael||                null|  30|   Andy||                null|  19| Justin||spark-submit --cl...|null|   null||                 100|null|   null|+--------------------+----+-------+

阅读全文

0 0