spark异常处理

来源：互联网发布：淘宝钢化膜评语编辑：程序博客网时间：2024/06/05 02:29

spark 异常处理

类java

scala可以写成类似java的异常处理模式，如果是读取本地文件，

import java.io.FileReaderimport

java.io.FileNotFoundException

import java.io.IOException

object Demo {

def main(args: Array[String]) {

try {

val f = new FileReader("input.txt")

} catch {

case ex: FileNotFoundException =>{

println("Missing file exception")

} case ex: IOException => {

println("IO Exception") } } }}

简单的 判断文件是否存在

val sc: SparkContext = eachRdd.sparkContextval hadoopConf: Configuration = sc.hadoopConfigurationval fs: FileSystem = org.apache.hadoop.fs.FileSystem.get(hadoopConf)// 这里是否不需要collect？val lines: Array[(String, FtpMap)] = oiddRdd.collect()// 文件名流转化为文件数据流lines.foreach {eachFileJson: (String, FtpMap) => {  val topic: String = eachFileJson._1  printLog.info("topic: " + topic)  val fileJson = eachFileJson._2  val filePath = fileJson.file_path  val fileExists: Boolean = try {    fs.exists(new org.apache.hadoop.fs.Path(filePath))  } catch {    case e: Exception => {      printLog.error("Exception: filePath:" + filePath + " e:" + e)      false    }  }  if (fileExists) {    val lines: RDD[String] = sc.textFile(filePath)  }}

Try和两个子类 Success[T] 和Failture[T]

但是如果是spark代码，读hdfs时是lazy模式，所以即使使用try-catch运行时一旦某行格式错误也会报错：

尤其针对读取 hdfs文件时报错 Caused by: java.lang.ArrayIndexOutOfBoundsException

参考： http://blog.csdn.net/zrc199021/article/details/52711593

重要方法是将string在 split的时候 Try 或的 success或failure 类型。

val tokens = lines.flatMap(_ split " ")   .map (s => Try(s(10)))得到Try的子类Success或者Failure，如果计算成功，返回Success的实例，如果抛出异常，返回Failure并携带相关信息import scala.util.{Try, Success, Failure}def divideBy(x: Int, y: Int): Try[Int] = {     //该步骤获得Success[Int] 或Failure[Int]  Try(x / y)}println(divideBy(1, 1).getOrElse(0)) // 1    //该步骤重新获取Try[T]中的T类型println(divideBy(1, 0).getOrElse(0)) //0

详细的Option，Either和Try数据处理参考：http://blog.csdn.net/jasonding1354/article/details/46822417

阅读全文

0 0