Spark算子(六)
来源:互联网 发布:银行大数据应用 编辑:程序博客网 时间:2024/05/28 17:06
Point 1:FlatMapOperator
package com.spark.operator;import java.util.Arrays;import java.util.List;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.api.java.function.FlatMapFunction;import org.apache.spark.api.java.function.VoidFunction;public class FlatMapOperator { public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("LineCount").setMaster( "local"); JavaSparkContext sc = new JavaSparkContext(conf); List<String> lineList = Arrays.asList("hello xuruyun" , "hello xuruyun", "hello wangfei"); JavaRDD<String> lines = sc.parallelize(lineList); // flatMap = flat + map JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() { private static final long serialVersionUID = 1L; @Override public Iterable<String> call(String line) throws Exception { return Arrays.asList(line.split(" ")); } }); words.foreach(new VoidFunction<String>() { private static final long serialVersionUID = 1L; @Override public void call(String result) throws Exception { System.out.println(result); } }); sc.close(); }}
Point 2:FilterOperator
package com.spark.operator;import java.util.Arrays;import java.util.List;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.api.java.function.Function;import org.apache.spark.api.java.function.VoidFunction;public class FilterOperator { public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("LineCount").setMaster( "local"); JavaSparkContext sc = new JavaSparkContext(conf); // filter算子是过滤,里面的逻辑如果返回的是true就保留下来,false就过滤掉 List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5); JavaRDD<Integer> numberRDD = sc.parallelize(numbers); JavaRDD<Integer> results = numberRDD .filter(new Function<Integer, Boolean>() { private static final long serialVersionUID = 1L; @Override public Boolean call(Integer number) throws Exception { return number % 2 == 0; } }); results.foreach(new VoidFunction<Integer>() { private static final long serialVersionUID = 1L; @Override public void call(Integer result) throws Exception { System.out.println(result); } }); sc.close(); }}
Point 3:DinstinctOperator
package com.spark.operator;import java.util.Arrays;import java.util.List;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.api.java.function.VoidFunction;public class DinstinctOperator { public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("SampleOperator") .setMaster("local"); JavaSparkContext sc = new JavaSparkContext(conf); List<String> names = Arrays .asList("xuruyun", "liangyongqi", "wangfei","xuruyun"); JavaRDD<String> nameRDD = sc.parallelize(names,2); nameRDD.distinct().foreach(new VoidFunction<String>() { private static final long serialVersionUID = 1L; @Override public void call(String name) throws Exception { System.out.println(name); } }); sc.close(); }}
阅读全文
0 0
- Spark算子(六)
- Spark算子执行流程详解之六
- Spark算子(一)
- Spark算子(二)
- Spark算子(三)
- Spark算子(四)
- Spark算子(五)
- Spark算子(七)
- Spark算子(八)
- Spark算子(九)
- spark RDD算子(六)之键值对聚合操作reduceByKey,foldByKey,排序操作sortByKey
- 大数据算子(spark)
- spark算子实战(二)
- Spark 算子
- spark算子
- spark 算子
- Spark算子
- spark算子
- 字符串截取
- 类实现二维数组素数自增
- Collections.sort排序
- 简单搭建一个ES6语法测试环境
- MYSQL数据库proxysql配置,实现读写分离
- Spark算子(六)
- CentOS 镜像大全
- 非托管环境语言(c,c++)和托管环境语言(c#,java)的编译过程差别
- 利用Bitmap实现图片放大效果(项目中的drawable路径图片,内存卡中的图片)
- debian系统下安装ssh
- android 设置textview的字数多少
- https请求跨域问题
- JsTree之动态创建节点-yellowcong
- 基于应用宝实现微信h5页面中打开本地app,如果没有跳转下载页面的解决方案