RDD中cache和persist的区别

来源:互联网 发布:echo.js下载 编辑:程序博客网 时间:2024/05/18 16:55

转载自:http://www.ithao123.cn/content-6053935.html

[摘要:经过视察RDD.scala源代码便可晓得cache战persist的差别:

 def persist (newLevel: StorageLevel): this.type = { 

    if (storageLevel != StorageLevel.NONE newLevel != storageLevel) 

       { throw new UnsupportedOperationException("Cannot chan...")}

 }

]

通过观察RDD.scala源代码即可知道cache和persist的区别:

def persist(newLevel: StorageLevel): this.type = {
  if (storageLevel != StorageLevel.NONE && newLevel != storageLevel) {
    throw new UnsupportedOperationException( "Cannot change storage level of an RDD after it was already assigned a level")
  }
  sc.persistRDD(this)

  sc.cleaner.foreach(_.registerRDDForCleanup(this))
  storageLevel = newLevel
  this
}

/** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
def persist(): this.type = persist(StorageLevel.MEMORY_ONLY)

 

/** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
def cache(): this.type = persist()

 

可知:

1)RDD的cache()方法其实调用的就是persist方法,缓存策略均为MEMORY_ONLY;

2)可以通过persist方法手工设定StorageLevel来满足工程需要的存储级别;

3)cache或者persist并不是action;

附:cache和persist都可以用unpersist来取消

0 0
原创粉丝点击