Submit a Spark job to YARN from code
来源:互联网 发布:短暂的婚姻 知乎 编辑:程序博客网 时间:2024/06/05 12:41
http://blog.sequenceiq.com/blog/2014/08/22/spark-submit-in-java/
In our previous Apache Spark related post we showed you how to write a simple machine learning job. In this post we’d like to show you how to submit a Spark job from code. At SequenceIQ we submit jobs to different clusters – based on load, customer profile, associated SLAs, etc. Doing this the
way was cumbersome so we needed a way to submit Spark jobs (and in general all of our jobs running in a YARN cluster) from code. Also due to the clusters, and changing job configurations we can’t use hardcoded parameters – in a previous blog post we highlighted how are we doing all these.Business as usual
Basically as you from the Spark documentation, you have to use the spark-submit script to submit a job. In nutshell SparkSubmit is called by the spark-class script with a lots of decorated arguments. In our example we examine only the YARN part of the submissions. As you can see in SparkSubmit.scala the YARN Client is loaded and its main method invoked (based on the arguments of the script).
It’s a pretty straightforward way to submit a Spark job to a YARN cluster, though you will need to change manually the parameters which as passed as arguments.
Submitting the job from Java code
In case if you would like to submit a job to YARN from Java code, you can just simply use this Client class directly in your application. (but you have to make sure that every environment variable what you will need is set properly).
Passing Configuration object
In the main method the org.apache.hadoop.conf.Configuration object is not passed to the Client class. A
is created explicitly in the constructor, which is actually okay (then client configurations are loaded from $HADOOP_CONF_DIR/core-site.xml and $HADOOP_CONF_DIR/yarn-site.xml). But what if you want to use (for example) an Ambari Configuration Service for retrieve your configuration, instead of using hardcoded ones?Fortunately, the configuration can be passed here (there is a
field in the Client), but you have to write your own main method.Code example
In our example we also use the 2 client XMLs as configuration (for demonstration purposes only), the main difference here is that we read the properties from the XMLs and filling them in the Configuration. Then we pass the Configuration object to the Client (which is directly invoked here).
To build the project use this command from the spark-submit directory:
After building it you find the required jars in spark-submit-runner/build/libs (
with all required dependencies) and spark-submit-app/build/libs. Put them in the same directory (do this also with this config folder too). After that run this command:During the submission note that: not just the app jar, but the spark-submit-runner jar is also uploaded (which is an SPARK_JAR environment variable.
) to the HDFS. To avoid this, you have to upload it to the HDFS manually and set theIf you get “Permission denied” exception on submit, you should set the HADOOP_USER_NAME environment variable to root (or something with proper rights).
As usual for us we ship the code – you can get it from our GitHub samples repository; the sample input is available here.
If you would like to play with Spark, you can use our Spark Docker container available as a trusted build on Docker.io repository.
For updates follow us on LinkedIn, Twitter or Facebook.
- Submit a Spark job to YARN from Java Code
- Submit a Spark job to YARN from code
- Submit JOB from FTP
- [Spark | Yarn | Hadoop] Spark Submit over Yarn
- Spark-submit模式yarn-cluster和yarn-client的区别
- Spark 2.1 backend implementation vary greatly from local mode to yarn mode
- spark-submit到yarn上遇到的各种坑
- How to access HBase from spark-shell using YARN as the master on CDH 5.3 and Spark 1.2
- Some code about copying from a file to another
- 如何使用yarn界面查看spark job运行的情况
- spark on yarn,Initial job has not accepted any resources
- How to submit a patch to upsream
- 关于spark-submit 使用yarn-client客户端提交spark任务的问题
- spark-submit
- spark-submit
- spark-submit
- spark-submit
- spark-submit
- 用户及系统开机启动目录注册表位置
- Go并发:利用sync.WaitGroup实现协程同步
- 最详细的maven pom.xml标签详解(转)
- kubernetes中的Volumes
- MySQL常用引擎的锁机制
- Submit a Spark job to YARN from code
- 欢迎使用CSDN-markdown编辑器
- Windows下Nginx的启动、停止等命令
- jQuery:改进图像
- MyBatis的执行过程总结
- 汇编--学习笔记(五)-组织数据
- (MySql)distinct、group by去重
- Oracle ADF MenuDemo 案例
- Java基础之泛型