sparkSQL中 DataSet 和 DataFram区别
来源:互联网 发布:淘宝积分记录怎么删除 编辑:程序博客网 时间:2024/04/30 15:33
1/dataSet 可以定义类型,可以定义类型,能对各种列进行各种精细操作
2/dataFram 能注册成表。然后直接写sql语句就能操作了
================DataFram==============================================
val df = spark.read.json("examples/src/main/resources/people.json")// Displays the content of the DataFrame to stdoutdf.show()// +----+-------+// | age| name|// +----+-------+// |null|Michael|// | 30| Andy|// | 19| Justin|// +----+-------+
// This import is needed to use the $-notationimport spark.implicits._// Print the schema in a tree formatdf.printSchema()// root// |-- age: long (nullable = true)// |-- name: string (nullable = true)// Select only the "name" columndf.select("name").show()// +-------+// | name|// +-------+// |Michael|// | Andy|// | Justin|// +-------+// Select everybody, but increment the age by 1df.select($"name", $"age" + 1).show()// +-------+---------+// | name|(age + 1)|// +-------+---------+// |Michael| null|// | Andy| 31|// | Justin| 20|// +-------+---------+// Select people older than 21df.filter($"age" > 21).show()// +---+----+// |age|name|// +---+----+// | 30|Andy|// +---+----+// Count people by agedf.groupBy("age").count().show()// +----+-----+// | age|count|// +----+-----+// | 19| 1|// |null| 1|// | 30| 1|// +----+-----+
================DataSet==============================================
// Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit,// you can use custom classes that implement the Product interfacecase class Person(name: String, age: Long)// Encoders are created for case classesval caseClassDS = Seq(Person("Andy", 32)).toDS()caseClassDS.show()// +----+---+// |name|age|// +----+---+// |Andy| 32|// +----+---+// Encoders for most common types are automatically provided by importing spark.implicits._val primitiveDS = Seq(1, 2, 3).toDS()primitiveDS.map(_ + 1).collect() // Returns: Array(2, 3, 4)// DataFrames can be converted to a Dataset by providing a class. Mapping will be done by nameval path = "examples/src/main/resources/people.json"val peopleDS = spark.read.json(path).as[Person]peopleDS.show()// +----+-------+// | age| name|// +----+-------+// |null|Michael|// | 30| Andy|// | 19| Justin|// +----+-------+
======================================================================编程
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoderimport org.apache.spark.sql.Encoder// For implicit conversions from RDDs to DataFramesimport spark.implicits._// Create an RDD of Person objects from a text file, convert it to a Dataframeval peopleDF = spark.sparkContext .textFile("examples/src/main/resources/people.txt") .map(_.split(",")) .map(attributes => Person(attributes(0), attributes(1).trim.toInt)) .toDF()// Register the DataFrame as a temporary viewpeopleDF.createOrReplaceTempView("people")// SQL statements can be run by using the sql methods provided by Sparkval teenagersDF = spark.sql("SELECT name, age FROM people WHERE age BETWEEN 13 AND 19")// The columns of a row in the result can be accessed by field indexteenagersDF.map(teenager => "Name: " + teenager(0)).show()// +------------+// | value|// +------------+// |Name: Justin|// +------------+// or by field nameteenagersDF.map(teenager => "Name: " + teenager.getAs[String]("name")).show()// +------------+// | value|// +------------+// |Name: Justin|// +------------+// No pre-defined encoders for Dataset[Map[K,V]], define explicitlyimplicit val mapEncoder = org.apache.spark.sql.Encoders.kryo[Map[String, Any]]// Primitive types and case classes can be also defined asimplicit val stringIntMapEncoder: Encoder[Map[String, Int]] = ExpressionEncoder()// row.getValuesMap[T] retrieves multiple columns at once into a Map[String, T]teenagersDF.map(teenager => teenager.getValuesMap[Any](List("name", "age"))).collect()// Array(Map("name" -> "Justin", "age" -> 19))
0 0
- sparkSQL中 DataSet 和 DataFram区别
- RDD和DataFram转换
- Hive和SparkSQL区别
- DataFrame和SparkSql使用区别
- C#中DataSet和DataReader的区别
- C#中DataSet和DataReader的区别
- C#中DataSet和DataReader的区别
- C#中DataSet和SqlDataReader的区别
- C#中DataSet和DataReader的区别
- .net 中 dataset和datareader的区别
- c#中DataTable和DataSet的区别
- C#中DataSet和DataTable区别
- SparkSQL------SQL,DataFrame,DataSet
- SparkSQL中UDF和UDAF
- DataSet和DataReader区别
- Dataset和DataReader区别
- DataSet和DataReader区别
- DataSource 和 DataSet 区别
- 使用GitHub
- 谷歌浏览器调试快捷键
- 软件测试行业悲观走冷,“让天下没有难做的工程效能”是否一支强心剂
- Promise of iOS
- 懒加载jquery.lazyload.js
- sparkSQL中 DataSet 和 DataFram区别
- iOS 集成环信时报的错
- 顺序表应用1:多余元素删除之移位算法
- [OpenGL] 二维游戏:网格布局与碰撞检测
- OAuth的机制原理讲解及开发流程
- NYOJzb的生日
- Spring 继承Bean配置
- ELK知识图谱
- TFS命令行详解