第41课:Checkpoint彻底解密:Checkpoint的运行原理和源码实现彻底详解
来源:互联网 发布:淘宝营销培训 编辑:程序博客网 时间:2024/05/29 02:38
第41课:Checkpoint彻底解密:Checkpoint的运行原理和源码实现彻底详解
一:Checkpoint到底是什么?
1, Spark在生产环境下经常会面临Tranformations的RDD非常多(例如一个Job中包含1万个RDD)或者具体Tranformation产生的RDD本身计算特别复杂和耗时(例如计算时常超过1个小时),此时我们必须考虑对计算结果数据的持久化;
2, Spark是擅长多步骤迭代,同时擅长基于Job的复用,这个时候如果能够对曾经计算的过程产生的数据进行复用,就可以极大的提升效率;
3, 如果采用persist把数据放在内存中的话,虽然是最快速的但是也是最不可靠的;如果放在磁盘上也不是完全可靠的!例如磁盘会损坏,管理员可能清空磁盘等。
4, Checkpoint的产生就是为了相对而言更加可靠的持久化数据,在Checkpoint可以指定把数据放在本地并且是多副本的方式,但是在正常的生产环境下是放在HDFS,这就天然的借助了HDFS高容错的高可靠的特征来完成了最大化的可靠的持久化数据的方式;
5, Checkpoint是为了最大程度保证绝度可靠的复用RDD计算数据的Spark的高级功能,通过Checkpoint我们通过把数据持久化的HDFS来保证数据最大程度的安全性;
6, Checkpoint就是针对整个RDD计算链条中特别需要数据持久化的环节(后面会反复使用当前环节的RDD)开始基于HDFS等的数据持久化复用策略,通过对RDD启动checkpoint机制来实现容错和高可用。
RDD进行计算前需先看一下是否有Checkpoint,如果有Checkpoint,就不需要再进行计算。
RDD.scala的iterator源码方法:
1. final def iterator(split: Partition, context:TaskContext): Iterator[T] = {
2. if (storageLevel != StorageLevel.NONE) {
3. getOrCompute(split, context)
4. } else {
5. computeOrReadCheckpoint(split, context)
6. }
7. }
进入RDD.scala的getOrCompute方法,源码如下:
1. private[spark] defgetOrCompute(partition: Partition, context: TaskContext): Iterator[T] = {
2. val blockId = RDDBlockId(id,partition.index)
3. var readCachedBlock = true
4. // This method is called on executors, sowe need call SparkEnv.get instead of sc.env.
5. SparkEnv.get.blockManager.getOrElseUpdate(blockId,storageLevel, elementClassTag, () => {
6. readCachedBlock = false
7. computeOrReadCheckpoint(partition,context)
8. }) match {
getOrCompute方法的getOrElseUpdate方法传入的第四个参数是匿名函数,调用computeOrReadCheckpoint(partition, context)检查Checkpoint中是否有数据。
RDD.scala的computeOrReadCheckpoint源码如下:
1. private[spark] defcomputeOrReadCheckpoint(split: Partition, context: TaskContext): Iterator[T] =
2. {
3. if (isCheckpointedAndMaterialized) {
4. firstParent[T].iterator(split, context)
5. } else {
6. compute(split, context)
7. }
8. }
computeOrReadCheckpoint方法中的isCheckpointedAndMaterialized是一个布尔值,判断这个RDD是否checkpointed和被物化,Spark 2.0 Checkpoint中有二种方式:reliably或者locally。computeOrReadCheckpoint作为 `isCheckpointed`语义的别名返回值。
isCheckpointedAndMaterialized方法源码:
1. private[spark] defisCheckpointedAndMaterialized: Boolean =
2. checkpointData.exists(_.isCheckpointed)
回到RDD.scala的computeOrReadCheckpoint,如果已经持久化及物化isCheckpointedAndMaterialized,就调用firstParent[T]的iterator。如果没有持久化,则进行compute。
二:Checkpoint原理机制
1, 通过调用SparkContext.setCheckpointDir方法来指定进行Checkpoint操作的RDD把数据放在哪里,在生产集群中是放在HDFS上的,同时为了提高效率,在进行checkpoint的使用时可以指定很多目录。
我们看一下SparkContext,SparkContext为即将计算的RDD设置Checkpoint保存的目录。如果在集群中运行,必须是HDFS的目录路径。
SparkContext.scala的setCheckpointDir源码:
1. def setCheckpointDir(directory: String) {
2.
3. /*如果在集群上运行,如目录是本地的,则记录一个警告。否则,driver可能会试图从
4. 它自己的本地文件系统重建RDD 的checkpoint检测点,因为checkpoint检查点文件不正确。实际上是在executor机器上。*/
5. if (!isLocal &&Utils.nonLocalPaths(directory).isEmpty) {
6. logWarning("Spark is not running inlocal mode, therefore the checkpoint directory " +
7. s"must not be on the localfilesystem. Directory '$directory' " +
8. "appears to be on the localfilesystem.")
9. }
10.
11. checkpointDir = Option(directory).map { dir=>
12. val path = new Path(dir,UUID.randomUUID().toString)
13. val fs =path.getFileSystem(hadoopConfiguration)
14. fs.mkdirs(path)
15. fs.getFileStatus(path).getPath.toString
16. }
17. }
RDD.scala的checkpoint方法标记RDD的检查点checkpoint。它将保存到`SparkContext#setCheckpointDir`的目录检查点内的文件中,所有引用它的父RDDs将被移除。须在任何作业之前调用此函数。建议RDD在内存中缓存,否则保存在文件中时需要重新计算。
RDD.scala的checkpoint源码:
1. def checkpoint(): Unit = RDDCheckpointData.synchronized{
2. // NOTE: we use a global lock here due tocomplexities downstream with ensuring
3. // children RDD partitions point to thecorrect parent partitions. In the future
4. // we should revisit this consideration.
5. if (context.checkpointDir.isEmpty) {
6. throw new SparkException("Checkpointdirectory has not been set in the SparkContext")
7. } else if (checkpointData.isEmpty) {
8. checkpointData = Some(newReliableRDDCheckpointData(this))
9. }
10. }
其中的checkpointData是RDDCheckpointData:
1. private[spark] var checkpointData:Option[RDDCheckpointData[T]] = None
RDDCheckpointData是标识某个RDD要进行checkpoint。如果某个RDD要进行checkpoint,那在Spark框架内部就会生成RDDCheckpointData
1. private[spark] abstract classRDDCheckpointData[T: ClassTag](@transient private val rdd: RDD[T])
2. extends Serializable {
3.
4. import CheckpointState._
5.
6. // The checkpoint state of the associatedRDD.
7. protected var cpState = Initialized
8.
9. // The RDD that contains our checkpointeddata
10. private var cpRDD: Option[CheckpointRDD[T]] =None
11.
12. // TODO: are we sure we need to use a globallock in the following methods?
13.
14. /**
15. * Return whether the checkpoint data forthis RDD is already persisted.
16. */
17. def isCheckpointed: Boolean =RDDCheckpointData.synchronized {
18. cpState == Checkpointed
19. }
20.
21. /**
22. * Materialize this RDD and persist itscontent.
23. * This is called immediately after the firstaction invoked on this RDD has completed.
24. */
25. final def checkpoint(): Unit = {
26. // Guard against multiple threadscheckpointing the same RDD by
27. // atomically flipping the state of thisRDDCheckpointData
28. RDDCheckpointData.synchronized {
29. if (cpState == Initialized) {
30. cpState = CheckpointingInProgress
31. } else {
32. return
33. }
34. }
35.
36. val newRDD = doCheckpoint()
37.
38. // Update our state and truncate the RDDlineage
39. RDDCheckpointData.synchronized {
40. cpRDD = Some(newRDD)
41. cpState = Checkpointed
42. rdd.markCheckpointed()
43. }
44. }
45.
46. /**
47. * Materialize this RDD and persist itscontent.
48. *
49. * Subclasses should override this method todefine custom checkpointing behavior.
50. * @return the checkpoint RDD created in theprocess.
51. */
52. protected def doCheckpoint():CheckpointRDD[T]
53.
54. /**
55. * Return the RDD that contains ourcheckpointed data.
56. * This is only defined if the checkpointstate is `Checkpointed`.
57. */
58. def checkpointRDD: Option[CheckpointRDD[T]] =RDDCheckpointData.synchronized { cpRDD }
59.
60. /**
61. * Return the partitions of the resultingcheckpoint RDD.
62. * For tests only.
63. */
64. def getPartitions: Array[Partition] =RDDCheckpointData.synchronized {
65. cpRDD.map(_.partitions).getOrElse {Array.empty }
66. }
67.
68. }
69.
70. /**
71. * Global lock for synchronizing checkpointoperations.
72. */
73. private[spark] objectRDDCheckpointData
2, 在进行RDD的checkpoint的时候其所依赖的所有的RDD都会从计算链条中清空掉;
3, 作为最佳实践,一般在进行checkpoint方法调用前通过都要进行persist来把当前RDD的数据持久化到内存或者磁盘上,这是因为checkpoint是Lazy级别,必须有Job的执行且在Job执行完成后才会从后往前回溯哪个RDD进行了Checkpoint标记,然后对该标记了要进行Checkpoint的RDD新启动一个Job执行具体的Checkpoint的过程;
4, Checkpoint改变了RDD的Lineage;
5, 当我们调用了checkpoint方法要对RDD进行Checkpoint操作的话,此时框架会自动生成RDDCheckpointData,当RDD上运行过一个Job后就会立即触发RDDCheckpointData中的checkpoint方法,在其内部会调用doCheckpoint,实际上在生产环境下会调用ReliableRDDCheckpointData的doCheckpoint,在生产环境下会导致ReliableCheckpointRDD的writeRDDToCheckpointDirectory的调用,而在writeRDDToCheckpointDirectory方法内部会触发runJob来执行把当前的RDD中的数据写到Checkpoint的目录中,同时会产生ReliableCheckpointRDD实例;
RDDCheckpointData.scala的checkpoint方法进行真正的checkpoint:在RDDCheckpointData.synchronized同步块中先判断(cpState的状态,然后调用doCheckpoint()。
RDDCheckpointData.scala的checkpoint方法源码:
1. final def checkpoint(): Unit = {
2. // Guard against multiple threadscheckpointing the same RDD by
3. // atomically flipping the state of thisRDDCheckpointData
4. RDDCheckpointData.synchronized {
5. if (cpState == Initialized) {
6. cpState = CheckpointingInProgress
7. } else {
8. return
9. }
10. }
11.
12. val newRDD = doCheckpoint()
13.
14. // Update our state and truncate the RDDlineage
15. RDDCheckpointData.synchronized {
16. cpRDD = Some(newRDD)
17. cpState = Checkpointed
18. rdd.markCheckpointed()
19. }
20. }
其中的doCheckpoint方法是RDDCheckpointData.scala中的方法,这里没有具体的实现。
1. protected def doCheckpoint():CheckpointRDD[T]
RDDCheckpointData的子类包括:LocalRDDCheckpointData、ReliableRDDCheckpointData。ReliableRDDCheckpointData子类中doCheckpoint方法具体的实现,在方法中进行writeRDDToCheckpointDirectory的调用。
ReliableRDDCheckpointData.scala的doCheckpoint源码:
1. protected override def doCheckpoint():CheckpointRDD[T] = {
2. val newRDD =ReliableCheckpointRDD.writeRDDToCheckpointDirectory(rdd, cpDir)
3.
4. // Optionally clean our checkpoint files ifthe reference is out of scope
5. if(rdd.conf.getBoolean("spark.cleaner.referenceTracking.cleanCheckpoints",false)) {
6. rdd.context.cleaner.foreach { cleaner=>
7. cleaner.registerRDDCheckpointDataForCleanup(newRDD,rdd.id)
8. }
9. }
10.
11. logInfo(s"Done checkpointing RDD${rdd.id} to $cpDir, new parent is RDD ${newRDD.id}")
12. newRDD
13. }
14.
15. }
writeRDDToCheckpointDirectory将RDD的数据写入到checkpoint的文件中,返回一个ReliableCheckpointRDD。
l 首先找到sparkContext,赋值给sc变量。
l 基于checkpointDir创建checkpointDirPath。
l fs 获取文件系统的内容。
l 然后是广播sc.broadcast,将路径信息广播给所有的Executor。
l 接下来是 sc.runJob,触发runJob执行把当前的RDD中的数据写到Checkpoint的目录中。
l 最后返回ReliableCheckpointRDD。无论是对哪个RDD进行checkpoint,最终会产生ReliableCheckpointRDD,以checkpointDirPath.toString中的数据为数据来源;以originalRDD.partitioner的分区器partitioner作为partitioner;这里的originalRDD就是要进行checkpoint的RDD。
writeRDDToCheckpointDirectory的源码如下:
1. defwriteRDDToCheckpointDirectory[T: ClassTag](
2. originalRDD: RDD[T],
3. checkpointDir: String,
4. blockSize: Int = -1):ReliableCheckpointRDD[T] = {
5.
6. val sc = originalRDD.sparkContext
7.
8. // Create the output path for thecheckpoint
9. val checkpointDirPath = newPath(checkpointDir)
10. valfs = checkpointDirPath.getFileSystem(sc.hadoopConfiguration)
11. if (!fs.mkdirs(checkpointDirPath)) {
12. throw new SparkException(s"Failed tocreate checkpoint path $checkpointDirPath")
13. }
14.
15. // Save to file, and reload it as an RDD
16. val broadcastedConf = sc.broadcast(
17. newSerializableConfiguration(sc.hadoopConfiguration))
18. // TODO: This is expensive because itcomputes the RDD again unnecessarily (SPARK-8582)
19. sc.runJob(originalRDD,
20. writePartitionToCheckpointFile[T](checkpointDirPath.toString,broadcastedConf) _)
21.
22. if (originalRDD.partitioner.nonEmpty) {
23. writePartitionerToCheckpointDir(sc,originalRDD.partitioner.get, checkpointDirPath)
24. }
25.
26. val newRDD = new ReliableCheckpointRDD[T](
27. sc, checkpointDirPath.toString,originalRDD.partitioner)
28. if (newRDD.partitions.length !=originalRDD.partitions.length) {
29. throw new SparkException(
30. s"Checkpoint RDD$newRDD(${newRDD.partitions.length}) has different " +
31. s"number of partitions fromoriginal RDD $originalRDD(${originalRDD.partitions.length})")
32. }
33. newRDD
34. }
ReliableCheckpointRDD是读取以前写入可靠存储系统checkpoint检查点文件数据的RDD。其中的partitioner是构建ReliableCheckpointRDD的时候传进来的。其中的getPartitions是构建一个一个的分片。其中getPreferredLocations获取数据本地性,fs.getFileBlockLocations获取文件在哪里的位置信息。其中compute方法通过ReliableCheckpointRDD.readCheckpointFile读取数据。
ReliableCheckpointRDD.scala
1. private[spark]class ReliableCheckpointRDD[T: ClassTag](
2. sc: SparkContext,
3. val checkpointPath: String,
4. _partitioner: Option[Partitioner] = None
5. ) extends CheckpointRDD[T](sc) {
6.
7. @transient private val hadoopConf =sc.hadoopConfiguration
8. @transient private val cpath = newPath(checkpointPath)
9. @transient private val fs = cpath.getFileSystem(hadoopConf)
10. private val broadcastedConf =sc.broadcast(new SerializableConfiguration(hadoopConf))
11.
12. // Fail fast if checkpoint directory does notexist
13. require(fs.exists(cpath), s"Checkpointdirectory does not exist: $checkpointPath")
14.
15. /**
16. * Return the path of the checkpointdirectory this RDD reads data from.
17. */
18. override val getCheckpointFile:Option[String] = Some(checkpointPath)
19. override val partitioner: Option[Partitioner]= {
20. _partitioner.orElse {
21. ReliableCheckpointRDD.readCheckpointedPartitionerFile(context,checkpointPath)
22. }
23. }
24. /**
25. * Return partitions described by the filesin the checkpoint directory.
26. *
27. * Since the original RDD may belong to aprior application, there is no way to know a
28. * priori the number of partitions to expect.This method assumes that the original set of
29. * checkpoint files are fully preserved in areliable storage across application lifespans.
30. */
31. protected override def getPartitions:Array[Partition] = {
32. // listStatus can throw exception if pathdoes not exist.
33. val inputFiles = fs.listStatus(cpath)
34. .map(_.getPath)
35. .filter(_.getName.startsWith("part-"))
36. .sortBy(_.getName.stripPrefix("part-").toInt)
37. // Fail fast if input files are invalid
38. inputFiles.zipWithIndex.foreach { case(path, i) =>
39. if (path.getName !=ReliableCheckpointRDD.checkpointFileName(i)) {
40. throw new SparkException(s"Invalidcheckpoint file: $path")
41. }
42. }
43. Array.tabulate(inputFiles.length)(i =>new CheckpointRDDPartition(i))
44. }
45. /**
46. * Return the locations of the checkpointfile associated with the given partition.
47. */
48. protected override defgetPreferredLocations(split: Partition): Seq[String] = {
49. val status = fs.getFileStatus(
50. new Path(checkpointPath,ReliableCheckpointRDD.checkpointFileName(split.index)))
51. val locations =fs.getFileBlockLocations(status, 0, status.getLen)
52. locations.headOption.toList.flatMap(_.getHosts).filter(_!= "localhost")
53. }
54.
55. /**
56. * Read the content of the checkpoint fileassociated with the given partition.
57. */
58. override def compute(split: Partition,context: TaskContext): Iterator[T] = {
59. val file = new Path(checkpointPath,ReliableCheckpointRDD.checkpointFileName(split.index))
60. ReliableCheckpointRDD.readCheckpointFile(file,broadcastedConf, context)
61. }
62.
63. }
64. …….
看一下ReliableCheckpointRDD.scala中compute方法中的ReliableCheckpointRDD.readCheckpointFile,readCheckpointFile读取指定检查点文件checkpoint的内容。readCheckpointFile方法中通过deserializeStream反序列化fileInputStream文件输入流,然后将deserializeStream变成一个Iterator。
ReliableCheckpointRDD.scala的readCheckpointFile源码:
1. def readCheckpointFile[T](
2. path: Path,
3. broadcastedConf:Broadcast[SerializableConfiguration],
4. context: TaskContext): Iterator[T] = {
5. val env = SparkEnv.get
6. val fs =path.getFileSystem(broadcastedConf.value.value)
7. val bufferSize =env.conf.getInt("spark.buffer.size", 65536)
8. val fileInputStream = fs.open(path,bufferSize)
9. val serializer = env.serializer.newInstance()
10. val deserializeStream =serializer.deserializeStream(fileInputStream)
11.
12. // Register an on-task-completion callbackto close the input stream.
13. context.addTaskCompletionListener(context=> deserializeStream.close())
14.
15. deserializeStream.asIterator.asInstanceOf[Iterator[T]]
16. }
17.
18. }
ReliableRDDCheckpointData.scala的cleanCheckpoint方法,清理RDD数据相关的checkpoint文件:
1. defcleanCheckpoint(sc: SparkContext, rddId: Int): Unit = {
2. checkpointPath(sc, rddId).foreach { path=>
3. path.getFileSystem(sc.hadoopConfiguration).delete(path,true)
4. }
5. }
在生产环境中我们不使用LocalCheckpointRDD,LocalCheckpointRDD的getPartitions直接从toArray级别中new出来CheckpointRDDPartition;LocalCheckpointRDD的compute方法直接报异常。
LocalCheckpointRDD源码:
1. private[spark]class LocalCheckpointRDD[T: ClassTag](
2. sc: SparkContext,
3. rddId: Int,
4. numPartitions: Int)
5. extends CheckpointRDD[T](sc) {
6. ......
7. protected override defgetPartitions: Array[Partition] = {
8. (0 until numPartitions).toArray.map { i=> new CheckpointRDDPartition(i) }
9. }
10. …….
11. override def compute(partition: Partition,context: TaskContext): Iterator[T] = {
12. throw new SparkException(
13. s"Checkpoint block${RDDBlockId(rddId, partition.index)} not found! Either the executor " +
14. s"that originally checkpointed thispartition is no longer alive, or the original RDD is " +
15. s"unpersisted. If this problempersists, you may consider using `rdd.checkpoint()` " +
16. s"instead, which is slower thanlocal checkpointing but more fault-tolerant.")
17. }
18.
19. }
上士闻道,勤而行之;中士闻道,若存若亡;下士闻道,大笑之。不笑不足以为道。
- 第41课:Checkpoint彻底解密:Checkpoint的运行原理和源码实现彻底详解
- checkpoint彻底解密
- 第42课:Spark Streaming中checkpoint内幕实现彻底解密(源代码提问:checkpoint源代码修改,适用场景:spark的版本升级,数据恢复。。)
- 大数据IMF传奇行动绝密课程第41课:Checkpoint彻底解密
- 第40课: CacheManager彻底解密:CacheManager运行原理流程图和源码详解
- 42:Spark Streaming中checkpoint内幕实现彻底解密
- 第28课:彻底解密Spark Sort-Based Shuffle排序具体实现内幕和源码详解
- 第11课:彻底解密WordCount运行原理学习笔记
- 第11课: 彻底解密WordCount运行原理
- CheckPoint运行原理
- CheckPoint运行原理
- 大数据Spark “蘑菇云”行动第38课:Spark中的Cache和Checkpoint运行内幕详解
- 第33课:彻底解密Spark 2.1.X中Shuffle 中Mapper端的源码实现
- 第33课:彻底解密Spark 2.1.X中Shuffle 中Mapper端的源码实现
- 《Spark商业案例与性能调优实战100课》第28课:彻底解密Spark Sort-Based Shuffle排序具体实现内幕和源码详解
- checkpoint和checkpoint优化的参数
- 大数据IMF传奇行动绝密课程第11课:彻底解密WordCount运行原理
- 大数据IMF传奇行动绝密课程第11课:彻底解密WordCount运行原理
- HDU5972-bitset的应用或者shift-and
- 合并两个排序的链表
- 专题二 符号的技巧---- 10.单引号和双引号
- 08.java语言基础-实现int类型数组元素拷贝
- 18. 4Sum
- 第41课:Checkpoint彻底解密:Checkpoint的运行原理和源码实现彻底详解
- 09.java语言基础-system类中的arraycopy方法
- 在android中创建计时器
- DP--多重背包--队列优化
- Chrome和Firefox插件个人收藏
- 华为帧中继中LMI协议
- JSP基本知识
- 10.java语言基础-排序算法-冒泡排序
- WIFI模块连接手机