Sparksql语法,读json

来源:互联网 发布:知乎童谣事件始末 编辑:程序博客网 时间:2024/05/21 18:49
sparksql语法,读json--样本[hadoop@node1 resources]$ pwd/home/hadoop/spark-1.5.2-bin-hadoop2.6/examples/src/main/resources[hadoop@node1 resources]$ cat people.json{"name":"Michael"}{"name":"Andy", "age":30}{"name":"Justin", "age":19}[hadoop@node1 resources]$ cat people.txtMichael, 29Andy, 30Justin, 19[hadoop@node1 resources]$ hadoop fs -put people* /test/input--技巧:tab键会显示所有可执行的命令--测试[hadoop@node1 spark-1.5.2-bin-hadoop2.6]$ spark-shell --读取json文件scala> val df=sqlContext.read.json("hdfs://node1:8020/test/input/people.json")df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]scala> df.show+----+-------+| age|   name|+----+-------+|null|Michael||  30|   Andy||  19| Justin|+----+-------+scala> df.printSchema()                   --desc xxroot |-- age: long (nullable = true) |-- name: string (nullable = true)scala> df.select("age").show()+----+| age|+----+|null||  30||  19|+----+--下面等价scala> df.select("name","age").show() scala> df.select($"name",$"age").show()scala> df.select(df("name"),df("age")).show()          --字段可以用df()括号括起来,要带双引号+-------+----+|   name| age|+-------+----+|Michael|null||   Andy|  30|| Justin|  19|+-------+----+scala> df.selectExpr("name", "age as age_old", "abs(age) as age_abs").show+-------+-------+-------+|   name|age_old|age_abs|+-------+-------+-------+|Michael|   null|   null||   Andy|     30|     30|| Justin|     19|     19|+-------+-------+-------+scala> df.countres12: Long = 3scala> df.filter(df("age")>21).show         --show是返回字段和表数据+---+----+|age|name|+---+----+| 30|Andy|+---+----+scala> df.filter(df("age")>21).collect      --collect是返回集合res14: Array[org.apache.spark.sql.Row] = Array([30,Andy])       scala> df.groupBy("age").count().show()+----+-----+| age|count|+----+-----+|null|    1||  19|    1||  30|    1|+----+-----+scala> df.agg(max("age"),sum("age"),min("age"),avg("age")).show+--------+--------+--------+--------+|max(age)|sum(age)|min(age)|avg(age)|+--------+--------+--------+--------+|      30|      49|      19|    24.5|+--------+--------+--------+--------+


0 0
原创粉丝点击