Spark Q&A : Spark利用databricks读取CSV文件报错 CSVFormat NoClassDefFoundError

来源:互联网 发布:国家创新训练项目 知乎 编辑:程序博客网 时间:2024/05/22 13:13

原文链接:http://blog.csdn.net/edin_blackpoint/article/details/72638015

Q: Spark使用databricks进行csv文件读取的时候报错java.lang.NoClassDefFoundError: org/apache/commons/csv/CSVFormat,找不到对应的CSVFormat类.

A: 根据kevinskii在Github上的回答,该问题出现的原因是在于spark-csv的jar文件中没有添加CSVFormat的依赖. 解决办法是下载common-csv的jar包并通过 -jar 添加到spark-submit的任务中.

It seems that the org/apache/commons/csv/CSVFormat dependency isn’t being packaged in the spark-csv jar file. Downloading the binary from (https://commons.apache.org/proper/commons-csv/download_csv.cgi), extracting the .jar from it and setting the permissions, and finally including it in the list of comma-separated JAR files following the “–jar” option when running the Spark shell solved it for me. 
Example: 
bin/pyspark –jars /path/to/spark-csv.jar,/path/to/commons-csv.jar

同时, m-mashaye在stackoverflow上给出了用textFile读取csv文件,并通过case class构建DF的解决办法, 适用于尝试过各种办法但是仍不能解决问题的绝望者.

Instead of using sqlContext.read, I used the following code to turn my .csv file into a dataframe. Suppose the .csv file has 5 columns as follow:

// Define case class case class Flight(arrDelay: Int, depDelay: Int, origin: String, dest: String, distance: Int)// Thenval flights=sc.textFile("2008.csv").map(_.split(",")).map(p => Flight(p(0).trim.toInt, p(1).trim.toInt, p(2), p(3), p(4).trim.toInt)).toDF()
原创粉丝点击