Spark源码阅读（一）RDD

来源：互联网发布：apache tiles性能编辑：程序博客网时间：2024/05/21 06:31

  def persist(): this.type = persist(StorageLevel.MEMORY_ONLY)  def cache(): this.type = persist()

可以看出persist的存储级别是MEMORY_NOLY
cache 与 persist 完全一样

  def countByKey(): Map[K, Long] = self.withScope {    self.mapValues(_ => 1L).reduceByKey(_ + _).collect().toMap  }

countByKey调用了reduceByKey，并且collect后转换成了map。所以如果Key的量比较大，谨慎调用该函数，否则会OOM，可以直接使用reduceByKey实现而不collect。

  def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long] = withScope {    map(value => (value, null)).countByKey()  }

countByValue直接调用了countByKey

  def isEmpty(): Boolean = withScope {    partitions.length == 0 || take(1).length == 0  }

isEmpty 操作是先调用partitions的长度，如果为0直接判断为true，如果不为0；再去一个元素看是否为空。

0 0