Spark DataFrames

来源:互联网 发布:淘宝号注册马上注册 编辑:程序博客网 时间:2024/05/17 22:36

转自:http://www.k6k4.com/chapter/show/aafliljce1474164458328

1、样本数据

每一行存一个json对象

  1. { "name": "Andy", "age": 30 }
  2. { "name": "Justin", "age": 19 }
  3. { "name": "tom", "age": 21 }
文件路径为 example/input/data

2、加载数据

  1. scala> val df=spark.read.json("example/input/data")
  2. ...
  3. df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]

3、查看数据

  1. scala> df.show
  2. +---+------+
  3. |age| name|
  4. +---+------+
  5. | 30| Andy|
  6. | 19|Justin|
  7. | 21| tom|
  8. +---+------+

4、查看表Schema

  1. scala> df.printSchema
  2. root
  3. |-- age: long (nullable = true)
  4. |-- name: string (nullable = true)


5、数据查询基本操作

  1. scala> df.select("name").show
  2. +------+
  3. | name|
  4. +------+
  5. | Andy|
  6. |Justin|
  7. | tom|
  8. +------+
  9.  
  10. scala> df.select($"name",$"age"+1).show
  11. +------+---------+
  12. | name|(age + 1)|
  13. +------+---------+
  14. | Andy| 31|
  15. |Justin| 20|
  16. | tom| 22|
  17. +------+---------+
  18.  
  19. scala> df.filter($"age">21).show
  20. +---+----+
  21. |age|name|
  22. +---+----+
  23. | 30|Andy|
  24. +---+----+
0 0