010-spark standalone模式JAVA版本WordCount代码
来源:互联网 发布:php app接口开发 编辑:程序博客网 时间:2024/05/20 01:12
文章参考spark的官方文档: http://spark.apache.org/docs/latest/quick-start.html
开发java版本的spark程序需要加入依赖
<!-- Spark dependency --> <dependency> <groupId> org.apache.spark</groupId > <artifactId> spark-core_2.10</artifactId > <version> 1.4.0</ version> </dependency>
1、统计代码
package com.jieli;import java.util.Arrays;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaPairRDD;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.api.java.function.FlatMapFunction;import org.apache.spark.api.java.function.Function2;import org.apache.spark.api.java.function.PairFunction;import scala.Tuple2;/*** java版本的本地模式运行spark单词计数** 参考文档: spark的官网 http://spark.apache.org/docs/latest/quick-start.html** 集群模式运行命令:* spark-submit --class com.jieli.JavaWordCountOnYarn /home/hadoop/JavaWordCountOnYarn-0.0.1-SNAPSHOT.jar wc.txt wt46** @author shenfl**/public class JavaWordCountOnYarn { public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("myApp"); // Create a context JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> textFile = sc.textFile(args[0]); /** * Return a new RDD by applying a function to all elements of this RDD. */ JavaRDD<String> words = textFile.flatMap(new FlatMapFunction<String, String>() { @Override public Iterable<String> call(String s) throws Exception { return Arrays.asList(s.split("\t")); } }); /** * Return a new RDD by applying a function to all elements of this RDD. */ JavaPairRDD<String, Integer> mapRDD = words.mapToPair(new PairFunction<String, String, Integer>() { @Override public Tuple2<String, Integer> call(String s) throws Exception { return new Tuple2<String, Integer>(s, 1); } }); /** * Merge the values for each key using an associative reduce function. * This will also perform the merging locally on each mapper before * sending results to a reducer, similarly to a "combiner" in MapReduce. */ JavaPairRDD<String, Integer> shuffleRDD = mapRDD.reduceByKey(new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer v1, Integer v2) throws Exception { return v1 + v2; } }, 1); /** Persist this RDD with the default storage level (`MEMORY_ONLY`). */ shuffleRDD.cache(); /** * Save this RDD as a text file, using string representations of * elements. */ shuffleRDD.saveAsTextFile(args[1]); }}
2、通过mvn package 生成jar包
3、执行
[hadoop@mycluster bin]$ spark-submit --class com.jieli.JavaWordCountOnYarn /home/hadoop/JavaWordCountOnYarn-0.0.1-SNAPSHOT.jar wc.txt wt46
4、验证结果
[hadoop@mycluster ~]$ hdfs dfs -cat wt46/par*
(you,2)
(hello,4)
(china,1)
(me,1)
(you,2)
(hello,4)
(china,1)
(me,1)
0 0
- 010-spark standalone模式JAVA版本WordCount代码
- 010-spark standalone模式Scala版本WordCount代码
- 安装Spark Standalone模式/Hadoop yarn模式并运行Wordcount
- Spark开发-WordCount详细讲解Java版本
- spark standalone模式配置
- Spark standalone模式安装
- Spark Standalone模式
- 安装spark - standalone模式
- Spark Standalone模式部署
- spark standalone&&yarn模式
- Spark开发-Standalone模式
- Spark Standalone模式搭建
- Spark之WordCount(Java代码实现)
- spark代码提交流程(Standalone)
- Spark Standalone模式应用程序开发
- spark standalone模式 zeppelin安装
- 005-spark standalone模式安装
- Spark Standalone完全分布模式
- SqlMapConfig.xml中的setting属性设置
- First of all
- 连接中间件异常 需要重新注册 --------------运维日志10
- Android SDK Android NDK Android Studio 官方下载地址
- 浅谈安卓项目框架发展
- 010-spark standalone模式JAVA版本WordCount代码
- poj 2231Moo Volume
- UIImageView响应点击事件
- Android实时监听网络状态
- NSOperation简单用法汇总
- Spring 整合 Hibernate(注解方式)
- AutoCAD二次开发的学习者快速掌握基于.NET的开发技术
- Android 反射Construct应用
- js小知识