spark学习笔记:构建独立应用并提交运行

来源:互联网 发布:word表格数据排序 编辑:程序博客网 时间:2024/06/03 13:23

前言

适用版本spark1.6.3,与ambari搭建的环境一致。

教学工程文件下载地址,本文使用mini-complete-example/mini-complete-example下的示例工程。

首先启动spark服务:

使用maven构建Java应用并运行

Java代码:

package com.oreilly.learningsparkexamples.mini.java;import java.util.Arrays;import java.util.List;import java.lang.Iterable;import scala.Tuple2;import org.apache.commons.lang.StringUtils;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaPairRDD;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.api.java.function.FlatMapFunction;import org.apache.spark.api.java.function.Function2;import org.apache.spark.api.java.function.PairFunction;public class WordCount {  public static void main(String[] args) throws Exception {    //从命令中获得IO位置    String inputFile = args[0];    String outputFile = args[1];    // Create a Java Spark Context.    SparkConf conf = new SparkConf().setAppName("wordCount");                JavaSparkContext sc = new JavaSparkContext(conf);    // Load our input data.    JavaRDD<String> input = sc.textFile(inputFile);    // Split up into words.    JavaRDD<String> words = input.flatMap(      new FlatMapFunction<String, String>() {        public Iterable<String> call(String x) {          return Arrays.asList(x.split(" "));        }});    // Transform into word and count.    JavaPairRDD<String, Integer> counts = words.mapToPair(      new PairFunction<String, String, Integer>(){        public Tuple2<String, Integer> call(String x){          return new Tuple2(x, 1);        }}).reduceByKey(new Function2<Integer, Integer, Integer>(){            public Integer call(Integer x, Integer y){ return x + y;}});    // Save the word count back out to a text file, causing evaluation.    counts.saveAsTextFile(outputFile);        }}

pom.xml文件(maven组件查询,主要注意使用的JDK与spark版本):

<project>  <groupId>com.oreilly.learningsparkexamples.mini</groupId>  <artifactId>learning-spark-mini-example</artifactId>  <modelVersion>4.0.0</modelVersion>  <name>example</name>  <packaging>jar</packaging>  <version>0.0.1</version>  <dependencies>    <dependency> <!-- Spark dependency -->      <groupId>org.apache.spark</groupId>      <artifactId>spark-core_2.10</artifactId>      <version>1.6.3</version>      <scope>provided</scope>    </dependency>  </dependencies>  <properties>    <java.version>1.8</java.version>  </properties>  <build>    <pluginManagement>      <plugins>        <plugin>          <groupId>org.apache.maven.plugins</groupId>          <artifactId>maven-compiler-plugin</artifactId>          <version>3.1</version>          <configuration>            <source>${java.version}</source>            <target>${java.version}</target>          </configuration>              </plugin>      </plugins>    </pluginManagement>  </build></project>

在pom.xml所在目录执行构建命令:

mvn clean && mvn compile && mvn package

构建成功后运行,再键入运行命令时需要注意,spark默认IO是指向HDFS的,使用本地文件系统必须加上“file://”;还有spark任务的输出路径(/home/daya/learning-spark-master/mini-complete-example/wordcounts)不能存在,若存在必须先删除:

spark-submit --class com.oreilly.learningsparkexamples.mini.java.WordCount ./target/learning-spark-mini-example-0.0.1.jar file:///home/daya/learning-spark-master/mini-complete-example/README.md file:///home/daya/learning-spark-master/mini-complete-example/wordcounts

输出文件内容:

使用sbt构建scala应用并运行

待补充

原创粉丝点击