关于RDD

来源:互联网 发布:csgo视频制作软件 编辑:程序博客网 时间:2024/06/05 06:42
spark对数据的核心抽象  RDD(弹性分布式数据集)RDD就是分布式的元素集合,在spark中对数据的所有操作不外乎创建RDD,转化已有RDD以及调用RDD操作进行求值,spark会自动将RDD中的数据分发到集群上,并将操作并行化
1,使用flatMap()将行数据切分为单词
public class BasicMap {    public static void main(String[] args) throws Exception {        SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi");        JavaSparkContext jsc = new JavaSparkContext(sparkConf);        JavaRDD<String> lines=jsc.parallelize(Arrays.asList("hello world","TaskSchedulerImpl Adding task set 0.0 with 1 tasks "));        JavaRDD<String> words=lines.flatMap(new FlatMapFunction<String, String>() {            @Override            public Iterable<String> call(String line) throws Exception {                return Arrays.asList(line.split(" "));            }        });        System.out.println("worls :"+words.collect());    }}
执行结果:worls :[hello, world, TaskSchedulerImpl, Adding, task, set, 0.0, with, 1, tasks]
计算RDD中个值的平方
public class BasicMap {    public static void main(String[] args) throws Exception {        SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi");        JavaSparkContext jsc = new JavaSparkContext(sparkConf);//        JavaRDD<String> lines=jsc.parallelize(Arrays.asList("hello world","hi"));//        JavaRDD<String> words=lines.flatMap(new FlatMapFunction<String, String>() {//            @Override//            public Iterable<String> call(String line) throws Exception {//                return Arrays.asList(line.split(" "));//            }//        });////        System.out.println("worls :"+words.first());        JavaRDD<Integer> rdd=jsc.parallelize(Arrays.asList(1,2,3,4));        JavaRDD<Integer> result=rdd.map(new Function<Integer, Integer>() {            @Override            public Integer call(Integer x) throws Exception {                return x*x;            }        });        System.out.println("result="+result.collect());    }}

执行结果:result=[1, 4, 9, 16]
0 0
原创粉丝点击