Spark RDD和Dataset区别

来源:互联网 发布:淘宝上的dota2 编辑:程序博客网 时间:2024/06/05 10:51

Datasets are similar to RDDs, however, instead of using Java serialization or Kryo they use a specialized Encoder to serialize the objects for processing or transmitting over the network. While both encoders and standard serialization are responsible for turning an object into bytes, encoders are code generated dynamically and use a format that allows Spark to perform many operations like filtering, sorting and hashing without deserializing the bytes back into an object.


A Dataset can be constructed from JVM objects.

原创粉丝点击