Submit a Spark job to YARN from Java Code

来源:互联网 发布:空中杀手 知乎 编辑:程序博客网 时间:2024/06/06 02:13



Submit a Spark job to YARN from Java Code

In this post I will show you how to submit a Spark job from Java code. Typically, we submit Spark jobs to "Spark Cluster" and Hadoop/YARN by using $SPARK_HOME/bin/spark-submit shell script. Submitting Spark job from a shell script limits programmers when they want to submit Spark jobs from Java code (such as Java servlets or other Java code such as REST servers).

Use YARN's Client Class

Below is a complete Java code, which submits a Spark job to YARN from Java code (no shell scripting is required). The main class used for submitting a Spark job to YARN is the org.apache.spark.deploy.yarn.Client` class.

Complete Java Client Program

$ cat SubmitSparkJobToYARNFromJavaCode.java// import required classes and interfacesimport org.apache.spark.deploy.yarn.Client;import org.apache.spark.deploy.yarn.ClientArguments;import org.apache.hadoop.conf.Configuration;import org.apache.spark.SparkConf;public class SubmitSparkJobToYARNFromJavaCode {   public static void main(String[] arguments) throws Exception {       // prepare arguments to be passed to        // org.apache.spark.deploy.yarn.Client object       String[] args = new String[] {           // the name of your application           "--name",           "myname",           // memory for driver (optional)           "--driver-memory",           "1000M",           // path to your application's JAR file            // required in yarn-cluster mode                 "--jar",           "/Users/mparsian/zmp/github/data-algorithms-book/dist/data_algorithms_book.jar",           // name of your application's main class (required)           "--class",           "org.dataalgorithms.bonus.friendrecommendation.spark.SparkFriendRecommendation",           // comma separated list of local jars that want            // SparkContext.addJar to work with                 "--addJars",           "/Users/mparsian/zmp/github/data-algorithms-book/lib/spark-assembly-1.5.2-hadoop2.6.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/log4j-1.2.17.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/junit-4.12-beta-2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jsch-0.1.42.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/JeraAntTasks.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jedis-2.5.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jblas-1.2.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/hamcrest-all-1.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/guava-18.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math3-3.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math-2.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-logging-1.1.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang3-3.4.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang-2.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-io-2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-httpclient-3.0.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-daemon-1.0.5.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-configuration-1.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-collections-3.2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-cli-1.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/cloud9-1.3.2.jar",           // argument 1 to your Spark program (SparkFriendRecommendation)           "--arg",           "3",           // argument 2 to your Spark program (SparkFriendRecommendation)           "--arg",           "/friends/input",           // argument 3 to your Spark program (SparkFriendRecommendation)           "--arg",           "/friends/output",           // argument 4 to your Spark program (SparkFriendRecommendation)           // this is a helper argument to create a proper JavaSparkContext object           // make sure that you create the following in SparkFriendRecommendation program           // ctx = new JavaSparkContext("yarn-cluster", "SparkFriendRecommendation");           "--arg",           "yarn-cluster"       };       // create a Hadoop Configuration object       Configuration config = new Configuration();       // identify that you will be using Spark as YARN mode       System.setProperty("SPARK_YARN_MODE", "true");       // create an instance of SparkConf object       SparkConf sparkConf = new SparkConf();       // create ClientArguments, which will be passed to Client       ClientArguments cArgs = new ClientArguments(args, sparkConf);        // create an instance of yarn Client client       Client client = new Client(cArgs, config, sparkConf);        // submit Spark job to YARN       client.run();    }}

Running Complete Java Client Program

Now, we can submit Spark job from Java code:

$ javac  SubmitSparkJobToYARNFromJavaCode.java$ java  SubmitSparkJobToYARNFromJavaCode

Note: before running your Java code, make sure that the output directory does not exist:

hadoop fs -rmr /friends/output

0 0
原创粉丝点击