spark-submit提交任务到集群
来源:互联网 发布:网络语狗子是啥意思 编辑:程序博客网 时间:2024/05/16 05:47
1.参数选取
当我们的代码写完,打好jar,就可以通过bin/spark-submit 提交到集群,命令如下:
./bin/spark-submit \
--class <main-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
一般情况下使用上面这几个参数就够用了
- --class: The entry point for your application(e.g. org.apache.spark.examples.SparkPi)
- --master: The master URL for the cluster(e.g. spark://23.195.26.187:7077)
- --deploy-mode: Whether to deploy your driver on the worker nodes(cluster) or locally as an external client(client) (default: client) †
- --conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap“key=value” in quotes (as shown).
- application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
- application-arguments: Arguments passed to the main method of your main class, if any
对于不同的集群管理,对spark-submit的提交列举几个简单的例子
# Run application locally on 8 cores
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
/path/to/examples.jar \
100
# Run on a Spark standalone cluster in client deploy mode
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
# Run on a Spark standalone cluster in cluster deploy mode with supervise
# make sure that the driver is automatically restarted if it fails with non-zero exit code
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--deploy-mode cluster
--supervise
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
# Run on a YARN cluster export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \ # can also be `yarn-client` for client mode
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000
# Run a Python application on a Spark standalone cluster
./bin/spark-submit \
--master spark://207.184.161.138:7077 \
examples/src/main/python/pi.py \
1000
2.具体提交步骤
代码实现一个简单的统计
public class SimpleSample {
public static void main(String[] args) {
String logFile = "/home/bigdata/spark-1.5.1/README.md";
SparkConf conf = new SparkConf().setAppName("Simple Application");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> logData = sc.textFile(logFile).cache();
long numAs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return s.contains("a");
}
}).count();
long numBs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return s.contains("b");
}
}).count();
System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
}
}
打成jar
上传命令
./bin/spark-submit --class cs.spark.SimpleSample --master spark://spark1:7077 /home/jar/spark-test-0.0.1-SNAPSHOT.jar
- Spark-submit提交任务到集群
- spark-submit提交任务到集群-案例
- spark-submit提交任务到集群
- spark-submit 提交作业到集群
- Spark提交任务到集群
- Spark提交任务到集群
- spark-submit提交集群命令
- spark-submit提交jar包到集群找不到主类
- 提交任务到spark集群及spark集群的安装
- 【Spark系列6】spark submit提交任务
- Spark集群任务提交
- Spark集群中使用spark-submit提交jar任务包实战经验
- 使用spark-submit提交jar包到spark standalone集群(续)
- spark-submit提交任务的方式
- spark-submit提交任务的方式
- 蜗龙徒行-Spark学习笔记【四】Spark集群中使用spark-submit提交jar任务包实战经验
- Java Web提交参数到Spark集群执行任务
- Windows Spark On YARN 提交任务到CDH集群
- 10.css初始化
- [机器学习] SVM
- 关于web项目报"未明确定义列”的问题
- CF 423 C : String Reconstruction
- 一些无关紧要的好奇
- spark-submit提交任务到集群
- 学习日记-win7禁止软件安装
- Number String HDU
- 卓越的教练是如何训练高手的?
- MySQL5.7 linux二进制安装
- [Leetcode] 363. Max Sum of Rectangle No Larger Than K 解题报告
- 1043. 输出PATest(20) Hash散列
- android inflate初探
- dubbo使用帮助