Spark函数讲解：collect

来源：互联网发布：熟练使用办公软件编辑：程序博客网时间：2024/06/02 02:29

将RDD转成Scala数组，并返回。

函数原型

def collect(): Array[T]def collect[U: ClassTag](f: PartialFunction[T, U]): RDD[U]

collect函数的定义有两种，我们最常用的是第一个。第二个函数需要我们提供一个标准的偏函数，然后保存符合的元素到MappedRDD中。

实例

scala> val one: PartialFunction[Int, String] = { case 1 => "one"; case _ => "other"}one: PartialFunction[Int,String] = <function1>scala> val data = sc.parallelize(List(2,3,1))data: org.apache.spark.rdd.RDD[Int] = 　　　　ParallelCollectionRDD[11] at parallelize at <console>:12scala> data.collect(one).collectres4: Array[String] = Array(other, other, one)

注意

　　如果数据量比较大的时候，尽量不要使用collect函数，因为这可能导致Driver端内存溢出问题。

0 0

Spark函数讲解：collect
Spark函数讲解：aggregate
Spark函数讲解：cogroup
Spark函数讲解：coalesce
Spark函数讲解：checkpoint
Spark函数讲解：cartesian
Spark函数讲解：cache
Spark函数讲解：aggregateByKey
Spark函数讲解：collectAsMap
Spark函数讲解：combineByKey
Spark函数讲解：coalesce
Spark函数讲解：aggregateByKey
Spark函数讲解：collectAsMap
spark函数讲解：cogroup
spark函数讲解：aggregate
spark collect遍历
spark--actions算子--collect
Spark函数讲解序列文章
一些其它
C语言中的atan和atan2
Java---基于TCP协议的相互即时通讯小程序
FreeRTOS内核详解----LIST
初学socket网络编程
Spark函数讲解：collect
广告点击率预测 [离线部分]
实例方法和类方法的区别
【2016杭电女生赛1003】【暴力】Luck Competition 选数平均数乘2除3 小且最接近的数
spark 测试题
.Net 面试题整理(一)
hdoj-5695-Gym Class
Java---网络编程-C/S-B/S基础知识
Elasticsearch--索引操作