Sparksql语法,读json
来源:互联网 发布:知乎童谣事件始末 编辑:程序博客网 时间:2024/05/21 18:49
sparksql语法,读json--样本[hadoop@node1 resources]$ pwd/home/hadoop/spark-1.5.2-bin-hadoop2.6/examples/src/main/resources[hadoop@node1 resources]$ cat people.json{"name":"Michael"}{"name":"Andy", "age":30}{"name":"Justin", "age":19}[hadoop@node1 resources]$ cat people.txtMichael, 29Andy, 30Justin, 19[hadoop@node1 resources]$ hadoop fs -put people* /test/input--技巧:tab键会显示所有可执行的命令--测试[hadoop@node1 spark-1.5.2-bin-hadoop2.6]$ spark-shell --读取json文件scala> val df=sqlContext.read.json("hdfs://node1:8020/test/input/people.json")df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]scala> df.show+----+-------+| age| name|+----+-------+|null|Michael|| 30| Andy|| 19| Justin|+----+-------+scala> df.printSchema() --desc xxroot |-- age: long (nullable = true) |-- name: string (nullable = true)scala> df.select("age").show()+----+| age|+----+|null|| 30|| 19|+----+--下面等价scala> df.select("name","age").show() scala> df.select($"name",$"age").show()scala> df.select(df("name"),df("age")).show() --字段可以用df()括号括起来,要带双引号+-------+----+| name| age|+-------+----+|Michael|null|| Andy| 30|| Justin| 19|+-------+----+scala> df.selectExpr("name", "age as age_old", "abs(age) as age_abs").show+-------+-------+-------+| name|age_old|age_abs|+-------+-------+-------+|Michael| null| null|| Andy| 30| 30|| Justin| 19| 19|+-------+-------+-------+scala> df.countres12: Long = 3scala> df.filter(df("age")>21).show --show是返回字段和表数据+---+----+|age|name|+---+----+| 30|Andy|+---+----+scala> df.filter(df("age")>21).collect --collect是返回集合res14: Array[org.apache.spark.sql.Row] = Array([30,Andy]) scala> df.groupBy("age").count().show()+----+-----+| age|count|+----+-----+|null| 1|| 19| 1|| 30| 1|+----+-----+scala> df.agg(max("age"),sum("age"),min("age"),avg("age")).show+--------+--------+--------+--------+|max(age)|sum(age)|min(age)|avg(age)|+--------+--------+--------+--------+| 30| 49| 19| 24.5|+--------+--------+--------+--------+
0 0
- sparksql语法,读json
- Sparksql语法,读json
- sparksql语法,通过映射方式读txt
- sparksql语法,通过编程方式读txt
- sparksql语法,读parquet,load,save
- SparkSQL相关语法总结
- sparksql json,hive数据源
- SparkSQL之JSON
- SparkSQL 操作 Json 格式数据
- 自定义SparkSql语法的一般步骤
- SparkSQL JSON数据操作(1.3->1.4)
- SparkSql本地将json转成表
- SPARKSQL读SPARK表
- sparkSQL
- SparkSQL
- SparkSQL
- SparkSQL
- Sparksql处理json日志[要求sparksql统计json日志条数存入mysql数据库]
- 二叉树的镜像
- Django 微信公众号对接开发demo
- Deep Learning-TensorFlow (14) CNN卷积神经网络_深度残差网络 ResNet
- 深入理解String与StringBuilder
- POJ 1721 CARDS (置换群)
- Sparksql语法,读json
- Corn Fields 状压DP
- 图片压缩无bug版
- #if ... #endif
- 如何弄清二叉树前序中序后序遍历的顺序
- 听译letters live
- QM关于Repeater的使用
- 访问localhost时You don't have permission to access / on this server.
- 基于 Facebook Redex 实现 Android APK 的压缩和优化