SQLContext/HiveContext/SparkSession的使用(三)
来源:互联网 发布:ubuntu 启动服务命令 编辑:程序博客网 时间:2024/06/06 01:16
SparkSession的使用:
官网http://spark.apache.org/docs/latest/sql-programming-guide.htmlSpark 2.x中Spark SQL的入口点是SparkSession。
import org.apache.spark.sql.SparkSessionval spark = SparkSession .builder() .appName("Spark SQL basic example") .config("spark.some.config.option", "some-value") .getOrCreate()// For implicit conversions like converting RDDs to DataFramesimport spark.implicits._程序应用:
import org.apache.log4j.{Level, Logger}import org.apache.spark.sql.SparkSession/** * Created by *** 2017/12/22 15:29 * SparkSession的使用 */object SparkSessionApp { def main(args: Array[String]): Unit = { SetLogger if (args.length != 1) { println("Usage:Path must Input!") System.exit(0) } var path = args(0) val spark = SparkSession.builder() .master("local[2]") .appName("SparkSessionApp").getOrCreate() // spark.read.format("json").load() val people = spark.read.json(path) people.printSchema() people.show() spark.stop() } def SetLogger() = { Logger.getLogger("org").setLevel(Level.OFF) Logger.getLogger("com").setLevel(Level.OFF) System.setProperty("spark.ui.showConsoleProgress", "false") Logger.getRootLogger().setLevel(Level.OFF); }}
在Spark-shell中的使用:
scala> spark.sql("show tables").show+--------+---------+-----------+|database|tableName|isTemporary|+--------+---------+-----------+| default| dept| false|| default| emp| false|+--------+---------+-----------+
scala> spark.sql("select * from emp");res1: org.apache.spark.sql.DataFrame = [empno: int, ename: string ... 6 more fields]scala> res1.show+-----+------+---------+----+----------+-------+------+------+|empno| ename| job| mgr| hiredate| sal| comm|deptno|+-----+------+---------+----+----------+-------+------+------+| 7369| SMITH| CLERK|7902|1980-12-17| 800.0| null| 20|| 7499| ALLEN| SALESMAN|7698| 1981-2-20| 1600.0| 300.0| 30|| 7521| WARD| SALESMAN|7698| 1981-2-22| 1250.0| 500.0| 30|| 7566| JONES| MANAGER|7839| 1981-4-2| 2975.0| null| 20|| 7654|MARTIN| SALESMAN|7698| 1981-9-28| 1250.0|1400.0| 30|| 7698| BLAKE| MANAGER|7839| 1981-5-1| 2850.0| null| 30|| 7782| CLARK| MANAGER|7839| 1981-6-9| 2450.0| null| 10|| 7788| SCOTT| ANALYST|7566| 1987-4-19| 3000.0| null| 20|| 7839| KING|PRESIDENT|null|1981-11-17| 5000.0| null| 10|| 7844|TURNER| SALESMAN|7698| 1981-9-8| 1500.0| 0.0| 30|| 7876| ADAMS| CLERK|7788| 1987-5-23| 1100.0| null| 20|| 7900| JAMES| CLERK|7698| 1981-12-3| 950.0| null| 30|| 7902| FORD| ANALYST|7566| 1981-12-3| 3000.0| null| 20|| 7934|MILLER| CLERK|7782| 1982-1-23| 1300.0| null| 10|| 8888| HIVE| PROGRAM|7839| 1988-1-23|10300.0| null| null|+-----+------+---------+----+----------+-------+------+------+
scala> spark.sql("select * from emp e join dept d on e.deptno = d.deptno").show+-----+------+---------+----+----------+------+------+------+------+----------+--------+|empno| ename| job| mgr| hiredate| sal| comm|deptno|deptno| dname|location|+-----+------+---------+----+----------+------+------+------+------+----------+--------+| 7369| SMITH| CLERK|7902|1980-12-17| 800.0| null| 20| 20| RESEARCH| DALLAS|| 7499| ALLEN| SALESMAN|7698| 1981-2-20|1600.0| 300.0| 30| 30| SALES| CHICAGO|| 7521| WARD| SALESMAN|7698| 1981-2-22|1250.0| 500.0| 30| 30| SALES| CHICAGO|| 7566| JONES| MANAGER|7839| 1981-4-2|2975.0| null| 20| 20| RESEARCH| DALLAS|| 7654|MARTIN| SALESMAN|7698| 1981-9-28|1250.0|1400.0| 30| 30| SALES| CHICAGO|| 7698| BLAKE| MANAGER|7839| 1981-5-1|2850.0| null| 30| 30| SALES| CHICAGO|| 7782| CLARK| MANAGER|7839| 1981-6-9|2450.0| null| 10| 10|ACCOUNTING|NEW YORK|| 7788| SCOTT| ANALYST|7566| 1987-4-19|3000.0| null| 20| 20| RESEARCH| DALLAS|| 7839| KING|PRESIDENT|null|1981-11-17|5000.0| null| 10| 10|ACCOUNTING|NEW YORK|| 7844|TURNER| SALESMAN|7698| 1981-9-8|1500.0| 0.0| 30| 30| SALES| CHICAGO|| 7876| ADAMS| CLERK|7788| 1987-5-23|1100.0| null| 20| 20| RESEARCH| DALLAS|| 7900| JAMES| CLERK|7698| 1981-12-3| 950.0| null| 30| 30| SALES| CHICAGO|| 7902| FORD| ANALYST|7566| 1981-12-3|3000.0| null| 20| 20| RESEARCH| DALLAS|| 7934|MILLER| CLERK|7782| 1982-1-23|1300.0| null| 10| 10|ACCOUNTING|NEW YORK|+-----+------+---------+----+----------+------+------+------+------+----------+--------+Spark-sql的使用:
./bin spark-sql --master local[2]
spark-sql> show tables;17/12/22 16:27:18 INFO SparkSqlParser: Parsing command: show tables17/12/22 16:27:19 INFO HiveMetaStore: 0: get_database: default17/12/22 16:27:19 INFO audit: ugi=hadoop ip=unknown-ip-addr cmd=get_database: default17/12/22 16:27:19 INFO HiveMetaStore: 0: get_database: default17/12/22 16:27:19 INFO audit: ugi=hadoop ip=unknown-ip-addr cmd=get_database: default17/12/22 16:27:19 INFO HiveMetaStore: 0: get_tables: db=default pat=*17/12/22 16:27:19 INFO audit: ugi=hadoop ip=unknown-ip-addr cmd=get_tables: db=default pat=*17/12/22 16:27:20 INFO CodeGenerator: Code generated in 137.453424 msdefault dept falsedefault emp falseTime taken: 1.204 seconds, Fetched 2 row(s)17/12/22 16:27:20 INFO CliDriver: Time taken: 1.204 seconds, Fetched 2 row(s)
spark-sql> select * from emp;17/12/22 16:27:33 INFO SparkSqlParser: Parsing command: select * from emp17/12/22 16:27:33 INFO HiveMetaStore: 0: get_table : db=default tbl=emp17/12/22 16:27:33 INFO audit: ugi=hadoop ip=unknown-ip-addr cmd=get_table : db=default tbl=emp17/12/22 16:27:33 INFO CatalystSqlParser: Parsing command: int17/12/22 16:27:33 INFO CatalystSqlParser: Parsing command: string17/12/22 16:27:33 INFO CatalystSqlParser: Parsing command: string17/12/22 16:27:33 INFO CatalystSqlParser: Parsing command: int17/12/22 16:27:33 INFO CatalystSqlParser: Parsing command: string17/12/22 16:27:33 INFO CatalystSqlParser: Parsing command: double17/12/22 16:27:33 INFO CatalystSqlParser: Parsing command: double17/12/22 16:27:33 INFO CatalystSqlParser: Parsing command: int17/12/22 16:27:33 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 250.3 KB, free 366.1 MB)17/12/22 16:27:33 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.3 KB, free 366.0 MB)17/12/22 16:27:33 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.66.51:45897 (size: 22.3 KB, free: 366.3 MB)17/12/22 16:27:33 INFO SparkContext: Created broadcast 0 from 17/12/22 16:27:33 INFO FileInputFormat: Total input paths to process : 117/12/22 16:27:33 INFO SparkContext: Starting job: processCmd at CliDriver.java:37617/12/22 16:27:33 INFO DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions17/12/22 16:27:33 INFO DAGScheduler: Final stage: ResultStage 0 (processCmd at CliDriver.java:376)17/12/22 16:27:33 INFO DAGScheduler: Parents of final stage: List()17/12/22 16:27:33 INFO DAGScheduler: Missing parents: List()17/12/22 16:27:33 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376), which has no missing parents17/12/22 16:27:33 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 9.1 KB, free 366.0 MB)17/12/22 16:27:33 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.9 KB, free 366.0 MB)17/12/22 16:27:33 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.66.51:45897 (size: 4.9 KB, free: 366.3 MB)17/12/22 16:27:33 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:100617/12/22 16:27:33 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0))17/12/22 16:27:33 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks17/12/22 16:27:34 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 4872 bytes)17/12/22 16:27:34 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)17/12/22 16:27:34 INFO HadoopRDD: Input split: hdfs://hadoop000:8020/user/hive/warehouse/emp/emp.txt:0+70017/12/22 16:27:34 INFO CodeGenerator: Code generated in 17.792327 ms17/12/22 16:27:34 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1829 bytes result sent to driver17/12/22 16:27:34 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 131 ms on localhost (executor driver) (1/1)17/12/22 16:27:34 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 17/12/22 16:27:34 INFO DAGScheduler: ResultStage 0 (processCmd at CliDriver.java:376) finished in 0.148 s17/12/22 16:27:34 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 0.210277 s7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 207499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 307521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 207839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 107844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 307876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 207900 JAMES CLERK 7698 1981-12-3 950.0 NULL 307902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 207934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 108888 HIVE PROGRAM 7839 1988-1-23 10300.0 NULL NULLTime taken: 1.13 seconds, Fetched 15 row(s)17/12/22 16:27:34 INFO CliDriver: Time taken: 1.13 seconds, Fetched 15 row(s)
阅读全文
0 0
- SQLContext/HiveContext/SparkSession的使用(三)
- SQLContext/HiveContext/SparkSession的使用(一)
- SQLContext/HiveContext/SparkSession的使用(二)
- spark sql中的sqlcontext与hivecontext区别
- Spark-SQL和Hive on Spark, SqlContext和HiveContext
- 使用SparkSession相关问题
- sparksession
- Spark 2.0的SparkSession详解
- sqlContext.filter()返回的RDD为空
- SparkSession创建的二种方式
- hiveContext演示
- Spark 2.0介绍:SparkSession创建和使用相关API
- Spark 2.0介绍:SparkSession创建和使用相关API
- spark 2.0 踩过的SparkSession的坑
- spark学习-55-源代码:SparkSession的的创建
- SparkSession在akka中的多线程同步的情况
- SparkSession详解
- 在spark中使用Hive报错error: not found: value sqlContext
- 【Python】ICP迭代最近点算法
- C#操作MongoDB
- JAVA字节流与字符流的相互转换
- seo效果会受到模板的影响么?
- matlab安装后,启动总是出现激活界面
- SQLContext/HiveContext/SparkSession的使用(三)
- AES的Java实现
- 让Python脚本暂停执行的几种方法(转载)
- 记录一个软中断问题
- 使用bootstrap file input大批量上传文件时通过控制上传速度避免服务器压力过大
- 独家 | 10分钟带你上手TensorFlow实践(附代码)
- 私有pod包含第三方静态库
- 百度分享不支持https的解决方案
- 浅谈Vue.js与后端API交互——axios的应用