SparkSQL操作Hive表数据
来源:互联网 发布:淘宝店铺id 编辑:程序博客网 时间:2024/05/02 17:56
- 启动Hadoop:./sbin/start-all.sh
- 启动Spark-Shell:./bin/spark-shell --master local[2]
scala> spark.sql("show tables").show+--------+---------+-----------+|database|tableName|isTemporary|+--------+---------+-----------+| default| dept| false|| default| emp| false|+--------+---------+-----------+
scala> spark.sql("select * from emp").show+-----+------+---------+----+----------+-------+------+------+|empno| ename| job| mgr| hiredate| sal| comm|deptno|+-----+------+---------+----+----------+-------+------+------+| 7369| SMITH| CLERK|7902|1980-12-17| 800.0| null| 20|| 7499| ALLEN| SALESMAN|7698| 1981-2-20| 1600.0| 300.0| 30|| 7521| WARD| SALESMAN|7698| 1981-2-22| 1250.0| 500.0| 30|| 7566| JONES| MANAGER|7839| 1981-4-2| 2975.0| null| 20|| 7654|MARTIN| SALESMAN|7698| 1981-9-28| 1250.0|1400.0| 30|| 7698| BLAKE| MANAGER|7839| 1981-5-1| 2850.0| null| 30|| 7782| CLARK| MANAGER|7839| 1981-6-9| 2450.0| null| 10|| 7788| SCOTT| ANALYST|7566| 1987-4-19| 3000.0| null| 20|| 7839| KING|PRESIDENT|null|1981-11-17| 5000.0| null| 10|| 7844|TURNER| SALESMAN|7698| 1981-9-8| 1500.0| 0.0| 30|| 7876| ADAMS| CLERK|7788| 1987-5-23| 1100.0| null| 20|| 7900| JAMES| CLERK|7698| 1981-12-3| 950.0| null| 30|| 7902| FORD| ANALYST|7566| 1981-12-3| 3000.0| null| 20|| 7934|MILLER| CLERK|7782| 1982-1-23| 1300.0| null| 10|| 8888| HIVE| PROGRAM|7839| 1988-1-23|10300.0| null| null|+-----+------+---------+----+----------+-------+------+------+scala> spark.sql("select deptno,count(1) from emp group by deptno").show+------+--------+|deptno|count(1)|+------+--------+| null| 1|| 20| 5|| 10| 3|| 30| 6|+------+--------+
scala> spark.sql("select deptno,count(1) from emp group by deptno").filter("deptno is not null").show+------+--------+|deptno|count(1)|+------+--------+| 20| 5|| 10| 3|| 30| 6|+------+--------+
spark.sql("select deptno,count(1) from emp group by deptno").filter("deptno is not null").write.saveAsTable("hive_table")
org.apache.spark.sql.AnalysisException: Attribute name "count(1)" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;
scala> spark.sql("select deptno,count(1) as mount from emp group by deptno").filter("deptno is not null").write.saveAsTable("hive_table")Warning: fs.defaultFS is not set when running "chgrp" command. Warning: fs.defaultFS is not set when running "chmod" command.scala> spark.sql("show tables")res8: org.apache.spark.sql.DataFrame = [database: string, tableName: string ... 1 more field]scala> spark.sql("show tables").show+--------+----------+-----------+|database| tableName|isTemporary|+--------+----------+-----------+| default| dept| false|| default| emp| false|| default|hive_table| false|+--------+----------+-----------+
scala> spark.sql("select * from hive_table").show+------+-----+|deptno|mount|+------+-----+| 20| 5|| 10| 3|| 30| 6|+------+-----+scala> spark.table("hive_table").show+------+-----+|deptno|mount|+------+-----+| 20| 5|| 10| 3|| 30| 6|+------+-----+
spark.sqlContext.setConf("spark.sql.shuffle.partitions","10")在生产环境中一定要注意设置spark.sql.shuffle.partitions,默认是200
阅读全文
0 0
- SparkSQL操作Hive表数据
- SparkSQL之Hive操作
- sparksql 操作hive
- Java实现SparkSQL查询Hive表数据
- SparkSQL On Yarn with Hive,操作和访问Hive表
- SparkSQL On Yarn with Hive,操作和访问Hive表
- SparkSQL On Yarn with Hive,操作和访问Hive表
- SparkSQL读取Hive中的数据
- SparkSQL对hive数据源进行操作
- SparkSQL操作Hive Table(enableHiveSupport())
- SparkSQL 操作 Json 格式数据
- SparkSQL读取HBase数据,通过自定义外部数据源(hbase的Hive外关联表)
- cdh5.9添加sparksql cli直接操作hive
- SparkSQL JSON数据操作(1.3->1.4)
- SparkSQL写数据到Hive的动态分区表
- Kerberos认证下Sparksql向hive写数据错误
- sparksql与hive整合
- sparksql与hive整合
- Maven
- NEUQOJ:1059
- 天正建筑2014破解版
- Coherence-英雄联盟的幕后英雄
- PCIe配置空间和PCI设备中的寄存器
- SparkSQL操作Hive表数据
- TeX的引号(UVA272)
- Linux使用Shell脚本实现ftp自动上传
- 链表
- java---ioc
- IPv4 和 IPv6 头部结构主要区别
- 4.3.3 累积函数
- Zookeeper在分布式系统中的应用
- java_day35