Spark运行在Standalone模式下产生的临时目录的问题

来源：互联网发布：淘宝外观专利侵权判定编辑：程序博客网时间：2024/04/25 23:09

Spark 的Job任务在运行过程中产生大量的临时目录位置，导致某个分区磁盘写满，主要原因spark运行产生临时目录的默认路径/tmp/spark*

项目中使用的版本情况

Hadoop: 2.7.1

Spark:1.6.0

JDK:1.8.0

1、项目运维需求

线上的Spark的集群相关/tmp/spark-* 日志会把 /分区磁盘写满，建议优化应用程序或更改日志路径到/home/ 子目录下

2、解决方案

2.1 方案1（不建议使用）

可以通过crontab 定时执行rm -rf /tmp/spark*命令，缺点：当spark的任务执行，这个时候会生成/tmp/spark* 的临时文件，正好在这个时候

crontab 启动rm命令，从而导致文件找不到以至于spark任务执行失败

2.2 方案2（推荐在spark-env.sh 中配置参数，不在spark-defaults.conf 中配置）

spark环境配置spark.local.dir，其中 SPARK_LOCAL_DIRS ： storage directories to use on this node for shuffle and RDD data

修改 conf 目录下的spark-defaults.conf 或者 conf 目录下的spark-env.conf，下面我们来一一验证哪个更好。

（1）修改spark执行时临时目录的配置，增加如下一行

spark.local.dir /diskb/sparktmp,/diskc/sparktmp,/diskd/sparktmp,/diske/sparktmp,/diskf/sparktmp,/diskg/sparktmp

说明：可配置多个目录，以 "," 分隔。

（2）修改配置spark-env.sh下增加

export SPARK_LOCAL_DIRS=spark.local.dir /diskb/sparktmp,/diskc/sparktmp,/diskd/sparktmp,/diske/sparktmp,/diskf/sparktmp,/diskg/sparktmp

如果spark-env.sh与spark-defaults.conf都配置，则SPARK_LOCAL_DIRS覆盖spark.local.dir 的配置

生产环境我们按照这样的思路去处理

生产环境修改为：在spark-defaults.conf 下增加一行

spark.local.dir /home/hadoop/data/sparktmp

然后运行通过下面的命令验证：

bin/spark-submit  --class  org.apache.spark.examples.SparkPi \--master spark://10.4.1.1:7077 \--total-executor-cores 4 \--driver-memory 2g \--executor-memory 2g \--executor-cores 1 \lib/spark-examples*.jar  10

执行完成后，有些work下executor的日志发现会存在一些错误日志，错误如下：

6/09/08 15:55:53 INFO util.Utils: Successfully started service 'sparkExecutorActorSystem' on port 50212.

16/09/08 15:55:53 ERROR storage.DiskBlockManager: Failed to create local dir in . Ignoring this directory.

java.io.IOException: Failed to create a temp directory (under ) after 10 attempts!

at org.apache.spark.util.Utils$.createDirectory(Utils.scala:217)

at org.apache.spark.storage.DiskBlockManager$$anonfun$createLocalDirs$1.apply(DiskBlockManager.scala:135)

at org.apache.spark.storage.DiskBlockManager$$anonfun$createLocalDirs$1.apply(DiskBlockManager.scala:133)

at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)

at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)

at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:108)

at org.apache.spark.storage.DiskBlockManager.createLocalDirs(DiskBlockManager.scala:133)

at org.apache.spark.storage.DiskBlockManager.<init>(DiskBlockManager.scala:45)

at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:76)

at org.apache.spark.SparkEnv$.create(SparkEnv.scala:365)

at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217)

at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186)

at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)

at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)

at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151)

at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253)

at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)

16/09/08 15:55:53 ERROR storage.DiskBlockManager: Failed to create any local dir.

针对以上出错的原因我们通过源码进行分析

（1） DiskBlockManager类中的下面的方法

通过日志我们最终定位这块出现的错误

/**

* Create local directories for storing block data. These directories are

* located inside configured local directories and won't

* be deleted on JVM exit when using the external shuffle service.

private def createLocalDirs(conf: SparkConf): Array[File] = {

Utils.getConfiguredLocalDirs(conf).flatMap { rootDir =>

try {

val localDir = Utils.createDirectory(rootDir, "blockmgr")

logInfo(s"Created local directory at $localDir")

Some(localDir)

} catch {

case e: IOException =>

logError(s"Failed to create local dir in $rootDir. Ignoring this directory.", e)

None

}

（2） SparkConf.scala 类中的方法

这个方法告诉我们在spark-defaults.conf 中配置spark.local.dir参数在spark1.0 版本后已经过时。

/** Checks for illegal or deprecated config settings. Throws an exception for the former. Not

* idempotent - may mutate this conf object to convert deprecated settings to supported ones. */

private[spark] def validateSettings() {

if (contains("spark.local.dir")) {

val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +

"the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."

logWarning(msg)

}

val executorOptsKey = "spark.executor.extraJavaOptions"

val executorClasspathKey = "spark.executor.extr

。。。。

}

（3）Utils.scala 类中的方法

通过分析下面的代码，我们发现不在spark-env.sh 下配置SPARK_LOCAL_DIRS的情况下，

通过该conf.get("spark.local.dir", System.getProperty("java.io.tmpdir")).split(",")设置spark.local.dir，然后或根据路径创建，导致上述错误。

故我们直接在spark-env.sh 中设置SPARK_LOCAL_DIRS 即可解决。

然后我们直接在spark-env.sh 中配置：

export SPARK_LOCAL_DIRS=/home/hadoop/data/sparktmp

/**

* Return the configured local directories where Spark can write files. This

* method does not create any directories on its own, it only encapsulates the

* logic of locating the local directories according to deployment mode.

def getConfiguredLocalDirs(conf: SparkConf): Array[String] = {

val shuffleServiceEnabled = conf.getBoolean("spark.shuffle.service.enabled", false)

if (isRunningInYarnContainer(conf)) {

// If we are in yarn mode, systems can have different disk layouts so we must set it

// to what Yarn on this system said was available. Note this assumes that Yarn has

// created the directories already, and that they are secured so that only the

// user has access to them.

getYarnLocalDirs(conf).split(",")

} else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) {

conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator)

} else if (conf.getenv("SPARK_LOCAL_DIRS") != null) {

conf.getenv("SPARK_LOCAL_DIRS").split(",")

} else if (conf.getenv("MESOS_DIRECTORY") != null && !shuffleServiceEnabled) {

// Mesos already creates a directory per Mesos task. Spark should use that directory

// instead so all temporary files are automatically cleaned up when the Mesos task ends.

// Note that we don't want this if the shuffle service is enabled because we want to

// continue to serve shuffle files after the executors that wrote them have already exited.

Array(conf.getenv("MESOS_DIRECTORY"))

} else {

if (conf.getenv("MESOS_DIRECTORY") != null && shuffleServiceEnabled) {

logInfo("MESOS_DIRECTORY available but not using provided Mesos sandbox because " +

"spark.shuffle.service.enabled is enabled.")

}

// In non-Yarn mode (or for the driver in yarn-client mode), we cannot trust the user

// configuration to point to a secure directory. So create a subdirectory with restricted

// permissions under each listed directory.

conf.get("spark.local.dir", System.getProperty("java.io.tmpdir")).split(",")

}

通过命令行窗口观察日志的生成情况，观察Deleting directory行，发现确实改变了，终于成功了

16/09/08 14:56:19 INFO ui.SparkUI: Stopped Spark web UI at http://10.4.1.1:4040

16/09/08 14:56:19 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors

16/09/08 14:56:19 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down

16/09/08 14:56:19 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

16/09/08 14:56:19 INFO storage.MemoryStore: MemoryStore cleared

16/09/08 14:56:19 INFO storage.BlockManager: BlockManager stopped

16/09/08 14:56:19 INFO storage.BlockManagerMaster: BlockManagerMaster stopped

16/09/08 14:56:19 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

16/09/08 14:56:19 INFO spark.SparkContext: Successfully stopped SparkContext

16/09/08 14:56:19 INFO util.ShutdownHookManager: Shutdown hook called

16/09/08 14:56:19 INFO util.ShutdownHookManager: Deleting directory /home/hadoop/data/sparktmp/spark-a72435b2-71e7-4c07-9d60-b0dd41b71ecc

16/09/08 14:56:19 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

16/09/08 14:56:19 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

16/09/08 14:56:19 INFO util.ShutdownHookManager: Deleting directory /home/hadoop/data/sparktmp/spark-a72435b2-71e7-4c07-9d60-b0dd41b71ecc/httpd-7cd8762c-85b6-4e62-8e91-be668830b0a7

0 0