Spark2.x学习笔记:15、Spark SQL的SQL
来源:互联网 发布:nba2k16捏脸中国人数据 编辑:程序博客网 时间:2024/06/11 05:40
15、 Spark SQL的SQL
15.1 Spark SQL所支持的SQL语法
select [distinct] [column names]|[wildcard]from tableName[join clause tableName on join condition][where condition][group by column name][having conditions][order by column names [asc|desc]]
如果只用join进行查询,则支持的语法为:
select statementfrom statement[join | inner join | left join | left semi join | left outer join | right join |right outer join | full join | full outer join]on join condition
15.2 Spark SQL的SQL的框架
15.3 与Hive Metastore结合
(1)Spark要能找到HDFS和Hive的配置文件
- 第1种方法:可以直接将core-site.xml、hdfs-site.xml和hive-site.xml复制到Spark安装目录下的conf目录中。该方法存在一个缺陷,如果HDFS或Hive的配置修改了,则需要手动修改Spark对应的配置文件。
- 第2种方法:在Spark配置文件中指定Hadoop配置文件目录
(2)Spark SQL与Hive Metastore结合,直接使用spark.sql(“select … from table where …”)
15.4 实例演示
(1)spark-shell
[root@node1 ~]# spark-shell17/10/24 10:15:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableSpark context Web UI available at http://192.168.80.131:4040Spark context available as 'sc' (master = local[*], app id = local-1508854525067).Spark session available as 'spark'.Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)Type in expressions to have them evaluated.Type :help for more information.scala> spark.sql("show databases").show+------------+|databaseName|+------------+| default|| test|+------------+scala> spark.sql("show tables").show+--------+---------+-----------+|database|tableName|isTemporary|+--------+---------+-----------+| default| copyemp| false|| default| demo| false|| default| dept| false|| default| dual| false|| default| emp| false|| default| empbak| false|| default|employees| false|| default| mytb| false|| default| users| false|+--------+---------+-----------+scala> spark.sql("select * from emp").show+----+------+---------+----+----------+------+------+----+| eid| ename| job| mgr| hiredate| sal| comm| did|+----+------+---------+----+----------+------+------+----+|7782| CLARK| MANAGER|7839|1981-06-09|2450.0| 0.0| 10||7839| KING|PRESIDENT| 0|1981-11-17|5000.0| 0.0| 10||7934|MILLER| CLERK|7782|1982-01-23|1300.0| 0.0| 10||7369| SMITH| CLERK|7902|1980-12-17| 800.0| 0.0| 20||7566| JONES| MANAGER|7839|1981-04-02|2975.0| 0.0| 20||7902| FORD| ANALYST|7566|1981-12-03|3000.0| 0.0| 20||7499| ALLEN| SALESMAN|7698|1981-02-20|1600.0| 300.0| 30||7521| WARD| SALESMAN|7698|1981-02-22|1250.0| 500.0| 30||7654|MARTIN| SALESMAN|7698|1981-09-28|1250.0|1400.0| 30||7698| BLAKE| MANAGER|7839|1981-05-01|2850.0| 0.0| 30||7844|TURNER| SALESMAN|7698|1981-09-08|1500.0| 0.0| 30||7900| JAMES| CLERK|7698|1981-12-03| 950.0| 0.0| 30||8888|HADRON| null|null|2016-08-31|6666.0| null|null|+----+------+---------+----+----------+------+------+----+scala>
(2)spark-sql
[root@node1 ~]# spark-sql17/10/24 10:17:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable17/10/24 10:17:32 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.017/10/24 10:17:32 WARN ObjectStore: Failed to get database default, returning NoSuchObjectExceptionspark-sql> show databases;defaulttestTime taken: 3.93 seconds, Fetched 2 row(s)spark-sql> show tables;default copyemp falsedefault demo falsedefault dept falsedefault dual falsedefault emp falsedefault empbak falsedefault employees falsedefault mytb falsedefault users falseTime taken: 0.145 seconds, Fetched 9 row(s)spark-sql> select * from emp;7782 CLARK MANAGER 7839 1981-06-09 2450.0 0.0 107839 KING PRESIDENT 0 1981-11-17 5000.0 0.0 107934 MILLER CLERK 7782 1982-01-23 1300.0 0.0 107369 SMITH CLERK 7902 1980-12-17 800.0 0.0 207566 JONES MANAGER 7839 1981-04-02 2975.0 0.0 207902 FORD ANALYST 7566 1981-12-03 3000.0 0.0 207499 ALLEN SALESMAN 7698 1981-02-20 1600.0 300.0 307521 WARD SALESMAN 7698 1981-02-22 1250.0 500.0 307654 MARTIN SALESMAN 7698 1981-09-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-05-01 2850.0 0.0 307844 TURNER SALESMAN 7698 1981-09-08 1500.0 0.0 307900 JAMES CLERK 7698 1981-12-03 950.0 0.0 308888 HADRON NULL NULL 2016-08-31 6666.0 NULL NULLTime taken: 3.266 seconds, Fetched 13 row(s)spark-sql>
阅读全文
0 0
- Spark2.x学习笔记:15、Spark SQL的SQL
- Spark2.x学习笔记:13、Spark SQL快速入门
- Spark2.x学习笔记:14、Spark SQL程序设计
- Spark2.x学习笔记:17、Spark Streaming之HdfsWordCount 学习
- Spark2.x学习笔记:3、 Spark核心概念RDD
- Spark2.x学习笔记:5、Spark On YARN模式
- Spark2.x学习笔记:7、Spark应用程序设计
- Spark2.x学习笔记:9、 Spark编程实例
- Spark2.x学习笔记:16、Spark Streaming入门实例NetworkWordCount
- Spark2.x学习笔记:18、Spark Streaming程序解读
- spark2.0版本的 DataFrame、DataSet 与 Spark sql
- 初识Spark2.0之Spark SQL
- 初识Spark2.0之Spark SQL
- Spark 2.0 -SQL 学习笔记
- Spark2.x学习笔记:4、Spark程序架构与运行模式
- Spark2.x学习笔记:6、在Windows平台下搭建Spark开发环境(Intellij IDEA)
- Spark2.x学习笔记:8、 Spark应用程打包与提交
- Spark2.x学习笔记:12、Shuffle机制
- 世界上第一台存储型程序计算机
- 数据结构之线性表预习
- The specified JRE installation does not exist 问题解决
- Nginx.conf反向代理配置详解
- 使用Jenkins进行持续集成
- Spark2.x学习笔记:15、Spark SQL的SQL
- Python tkinter 用例 含button label Entry
- 10/21 本地yum源搭建,共享 以及 第三方yum仓库搭建
- 遍历map的四种方法
- C# DataReader
- jquery的checkbox的全选及取消,获取对应
- balanced-binary-tree Java code
- linux fork()函数浅析
- 关于DBUtils中QueryRunner的一些解读