spark-submit提交任务到集群-案例
来源:互联网 发布:网络整合营销方案 编辑:程序博客网 时间:2024/05/14 12:19
1.参数选取
当我们的代码写完,打好jar,就可以通过bin/spark-submit
提交到集群,命令如下:
./bin/spark-submit \ --class <main-class> --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ ... # other options <application-jar> \ [application-arguments]
一般情况下使用上面这几个参数就够用了
--class
: The entry point for your application (e.g.org.apache.spark.examples.SparkPi
)--master
: The master URL for the cluster (e.g.spark://23.195.26.187:7077
)--deploy-mode
: Whether to deploy your driver on the worker nodes (cluster
) or locally as an external client (client
) (default:client
) †--conf
: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).application-jar
: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, anhdfs://
path or afile://
path that is present on all nodes.application-arguments
: Arguments passed to the main method of your main class, if any对于不同的集群管理,对spark-submit的提交列举几个简单的例子
# Run application locally on 8 cores ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \--master local[8] \ /path/to/examples.jar \100 # Run on a Spark standalone cluster in client deploy mode ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \--master spark://207.184.161.138:7077 \--executor-memory 20G \ --total-executor-cores 100 \/path/to/examples.jar \ 1000 # Run on a Spark standalone cluster in cluster deploy mode with supervise # make sure that the driver is automatically restarted if it fails with non-zero exit code ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master spark://207.184.161.138:7077 \ --deploy-mode cluster--supervise --executor-memory 20G \--total-executor-cores 100 \ /path/to/examples.jar \ 1000 # Run on a YARN cluster export HADOOP_CONF_DIR=XXX./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \--master yarn-cluster \ # can also be `yarn-client` for client mode --executor-memory 20G \ --num-executors 50 \ /path/to/examples.jar \ 1000 # Run a Python application on a Spark standalone cluster ./bin/spark-submit \ --master spark://207.184.161.138:7077 \ examples/src/main/python/pi.py \ 1000
2.具体提交步骤
代码实现一个简单的统计
public class SimpleSample {public static void main(String[] args) {String logFile = "/home/bigdata/spark-1.5.1/README.md"; SparkConf conf = new SparkConf().setAppName("Simple Application");JavaSparkContext sc = new JavaSparkContext(conf);JavaRDD<String> logData = sc.textFile(logFile).cache();long numAs = logData.filter(new Function<String, Boolean>() {public Boolean call(String s) {return s.contains("a");}}).count();long numBs = logData.filter(new Function<String, Boolean>() {public Boolean call(String s) {return s.contains("b");}}).count();System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);}}
打成jar
上传命令
./bin/spark-submit --class cs.spark.SimpleSample --master spark://spark1:7077 /home/jar/spark-test-0.0.1-SNAPSHOT.jar
- spark-submit提交任务到集群-案例
- Spark-submit提交任务到集群
- spark-submit提交任务到集群
- spark-submit 提交作业到集群
- Spark提交任务到集群
- Spark提交任务到集群
- spark-submit提交集群命令
- spark-submit提交jar包到集群找不到主类
- 提交任务到spark集群及spark集群的安装
- 【Spark系列6】spark submit提交任务
- Spark集群任务提交
- Spark集群中使用spark-submit提交jar任务包实战经验
- 使用spark-submit提交jar包到spark standalone集群(续)
- spark-submit提交任务的方式
- spark-submit提交任务的方式
- 蜗龙徒行-Spark学习笔记【四】Spark集群中使用spark-submit提交jar任务包实战经验
- Java Web提交参数到Spark集群执行任务
- Windows Spark On YARN 提交任务到CDH集群
- 并查集
- JZOJ1248.【USACO题库】1.3.3 Calf Flac
- 离散题目2
- 矩阵填0--空间复杂度(1)
- linux设备模型之Class
- spark-submit提交任务到集群-案例
- 什么是超越数?
- 2017华为研发工程师编程题之汽水瓶
- PHP编译参数详解
- 时间堆实现定时器
- android:fitsSystemWindows=“true”
- AABB包围盒详解
- 阻塞、非阻塞、异步、同步以及select/poll和epoll
- (28)属性矩阵