Spark 2.0 DataFrame map操作中Unable to find encoder for type stored in a Dataset问题的分析与解决
来源:互联网 发布:php 命名空间 大小写 编辑:程序博客网 时间:2024/06/05 15:27
我们在进行dataframe map操作的时候,经常会报出"Unable to find encoder for type stored in a Dataset"的问题,其错误描述如下:
******error: Unable to find encoder for type storedin a Dataset. Primitive types (Int, String, etc) and Product types (caseclasses) are supported by importing spark.implicits._ Support for serializingother types will be added in future releases. resDf_upd.map(row => {******
我们在查看spark官方文档之后,发现其对spark有了一条这样的描述:
Dataset isSpark SQL’s strongly-typed API for workingwith structured data, i.e. recordswith a known schema.
Datasetsare lazy and structured queryexpressions are only triggered when anaction is invoked. Internally, aDataset represents a logicalplan thatdescribes the computation query required to producethe data (for a givenSpark SQLsession).
A Dataset isa result of executing aquery expression against data storage like files, Hivetables or JDBCdatabases. The structured query expression can be described by aSQL query, aColumn-based SQL expression or a Scala/Java lambda function. Andthat is whyDataset operations are available in three variants.
从这可以看出,要想对dataset进行操作,需要进行相应的encode操作。下面是官网给的例子:
// No pre-defined encodersfor Dataset[Map[K,V]], define explicitly
implicit val mapEncoder =org.apache.spark.sql.Encoders.kryo[Map[String,Any]]
// Primitive types and caseclasses can be also defined as
// implicit valstringIntMapEncoder: Encoder[Map[String, Any]] =ExpressionEncoder()
// row.getValuesMap[T]retrieves multiple columns at once into aMap[String, T]
teenagersDF.map(teenager=> teenager.getValuesMap[Any](List("name","age"))).collect()
// Array(Map("name"-> "Justin", "age"-> 19))
所以说,要进行map操作,就要先定义一个Encoder。。但是这就大大增加了系统的工作量。幸运的dataset为了更简单一些提供了一个转化RDD的操作。我们只需要将之前dataframe.map在中间修改为:dataframe.rdd.map即可。是不是很神奇呢?
阅读全文
0 0
- Spark 2.0 DataFrame map操作中Unable to find encoder for type stored in a Dataset.问题的分析与解决
- Spark 2.0 DataFrame map操作中Unable to find encoder for type stored in a Dataset.问题的分析与解决
- Spark 2.0 DataFrame map操作中Unable to find encoder for type stored in a Dataset.问题的分析与解决
- Spark 2.0 DataFrame map操作中Unable to find encoder for type stored in a Dataset问题的分析与解决
- 解决 Error:Unable to find encoder for type stored in a Dataset
- Unable to find a result type for extension [...] in location attribute
- Spark的RDD与DataFrame、DataSet
- 使用CocoaPods过程中 Unable to find a specification for
- 记一次git fatal: Unable to find remote helper for 'https'问题的解决
- spark 中 rdd to dataframe 问题
- EF中Unable to create a constant value of type...的错误解决
- 记录一下解决webdriver启动浏览器报“Unable to find a free port”问题的方法
- spark2.0版本的 DataFrame、DataSet 与 Spark sql
- 解决Cython在Window下Python2.7中:Unable to find vcvarsall.bat的问题
- python扩展问题”unable to find vcvarsall.bat“的解决
- Python扩展问题”unable to find vcvarsall.bat“的解决
- Unable to find vcvarsall.bat问题的解决
- Unable to find a value for "tStatus" in object of class org.entity.Passport using operator "."
- JAVA关于java.io.unsupportedencodingexception解决方法
- 界面换肤软件学习笔记
- JZOJ5390. 【NOIP2017提高A组模拟9.26】逗气 单调队列
- MessageBox.Show()的用法
- 欢迎使用CSDN-markdown编辑器
- Spark 2.0 DataFrame map操作中Unable to find encoder for type stored in a Dataset问题的分析与解决
- spring mvc 重定向加传参
- iOS 对数组中的对象进行排序
- LIS模板O(nlogn)
- openvswitch 2.7 安装过程记录 总结
- 更改SQL Server数据库名、数据库文件名、逻辑文件名的方法
- MongoDB入门
- AFNetworking的原理与基本使用
- lotus中密送,抄送,正常发送的区别