org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2
来源:互联网 发布:中航锂电研究院 知乎 编辑:程序博客网 时间:2024/06/09 20:31
今天在写spark 提数的时候遇到一个异常,如下
Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2 at $anonfun$1$$anonfun$apply$1.apply(<console>:27) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at $anonfun$1.apply(<console>:27) at $anonfun$1.apply(<console>:27) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:917) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:917) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
df的schema如下:
root |-- cityId: long (nullable = true) |-- countryId: long (nullable = true) |-- outline: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- _1: double (nullable = true) | | |-- _2: double (nullable = true) |-- provinceId: long (nullable = true) |-- townId: long (nullable = true)
使用如下方法在提取outline 字段时,报错
val outline = df.select("outline").collect().map(row => row.getAs[Seq[(Double,Double)]]("outline"))
解决方法,我已知两种:
第一种,在生成数据时,定义case class对象,不存储tuple,就我个人而言,不适用,数据已经生成完毕,再次生成数据需要2天的时间
第二种方法:
val lines = json.filter(row => !row.isNullAt(2)).select("outline").rdd.map(r => { val row:Seq[(Double,Double)] = r.getAs[Seq[Row]](0).map(x =>{(x.getDouble(0),x.getDouble(1))}) row }).collect()(0)
至于报错的原因,google了一下,我觉得有一种说法可信:
在row 中的这些列取得时候,要根据类型取,简单的像String,Seq[Double] 这种类型就可以直接取出来,但是像 Seq[(Double,Double)] 这种类型直接取得花就会丢失schema信息,虽然值能取到,但是schema信息丢了,在dataFrame中操作的时候就会抛错
阅读全文
0 0
- org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2
- 「报错」Spark: scala.MatchError (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
- cannot be cast to org.apache
- spark sql 中 java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.Gener
- cannot be cast to org.apache.AnnotationProcessor 错误解决方案
- org.apache.commons.beanutils.BasicDynaBean cannot be cast to ...
- SparkR读取CSV格式文件错误java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.u
- org.apache.struts.action.ActionMessage cannot be cast to org.apache.struts.action.ActionError
- org.apache.catalina.util.DefaultAnnotationProcessor cannot be cast to org.apache.AnnotationProcessor
- org.apache.catalina.util.DefaultAnnotationProcessor cannot be cast to org.apache.AnnotationProcessor
- org.apache.catalina.util.DefaultAnnotationProcessor cannot be cast to org.apache.AnnotationProcessor
- org.apache.catalina.util.DefaultAnnotationProcessor cannot be cast to org.apache.AnnotationProcessor
- org.apache.catalina.util.DefaultAnnotationProcessor cannot be cast to org.apache.AnnotationProcessor
- java.lang.ClassCastException:org.apache.catalina.util.DefaultAnnotationProcessor cannot be cast to org.apache.AnnotationProcesso
- org.apache.xerces.parsers.XML11Configuration cannot be cast to org.apache.xerces.xni.parser.XMLParserConfiguration
- org.apache.catalina.util.DefaultAnnotationProcessor cannot be cast to org.apache
- java.lang.ClassCastException:org.apache.catalina.util.DefaultAnnotationProcessor cannot be cast to org.apache.AnnotationProcesso
- org.apache.catalina.util.DefaultAnnotationProcessor cannot be cast to org.apache
- arcgis web for js
- P2255【L1 SOLO 第五场 APIO2009】抢掠计划
- Kotlin基本语法文档记录
- SGU 275 异或线性基
- 163
- org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2
- 深入HQL学习以及HQL和SQL的区别
- Linux下Qt程序的打包发布和问题总结
- Save the Students! UVALive
- [笔记分享] [DT] device tree之中断
- DB2-测试数据库安装过程
- 外出测车注意事项
- 利用正则表达式限制EditText小数点前后位数和格式
- wireshark分析IP协议