如何使用Spark SQL 的JDBC server
来源:互联网 发布:展板制作软件 编辑:程序博客网 时间:2024/05/22 04:43
摘要
如何使用Spark SQL 的JDBC server
简介
Spark SQL provides JDBC connectivity, which is useful for connecting business intelligence (BI) tools to a Spark cluster and for sharing a cluster across multipleusers. The JDBC server runs as a standalone Spark driver program that can be shared by multiple clients. Any client can cache tables in memory, query them, and so on and the cluster resources and cached data will be shared among all of them.
Spark SQL’s JDBC server corresponds to the HiveServer2 in Hive. It is also known as the “Thrift server” since it uses the Thrift communication protocol. Note that the JDBC server requires Spark be built with Hive support
运行环境
集群环境:CDH5.3.0
具体JAR版本如下:
spark版本:1.2.0-cdh5.3.0
hive版本:0.13.1-cdh5.3.0
hadoop版本:2.5.0-cdh5.3.0
启动 JDBC server
cd /etc/spark/conf
ln -s /etc/hive/conf/hive-site.xml hive-site.xml
cd /opt/cloudera/parcels/CDH/lib/spark/
chmod- -R 777 logs/
cd /opt/cloudera/parcels/CDH/lib/spark/sbin
./start-thriftserver.sh --master yarn --hiveconf hive.server2.thrift.port=10008
Connecting to the JDBC server with Beeline
cd /opt/cloudera/parcels/CDH/lib/spark/bin
beeline -u jdbc:hive2://hadoop04:10000
[root@hadoop04 bin]# beeline -u jdbc:hive2://hadoop04:10000
scan complete in 2ms
Connecting to jdbc:hive2://hadoop04:10000
Connected to: Spark SQL (version 1.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.3.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.3.0 by Apache Hive
0: jdbc:hive2://hadoop04:10000>
Working with Beeline
Within the Beeline client, you can use standard HiveQL commands to create, list, and query tables. You can find the full details of HiveQL in the Hive Language Manual,but here, we show a few common operations.
CREATE TABLE IF NOT EXISTS mytable (key INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
create table mytable(name string,addr string,status string) row format delimited fields terminated by '#'
#加载本地文件
load data local inpath '/external/tmp/data.txt' into table mytable
#加载hdfs文件
load data inpath 'hdfs://ju51nn/external/tmp/data.txt' into table mytable;
describe mytable;
explain select * from mytable where name = '张三'
select * from mytable where name = '张三'
cache table mytable
select count(*) total,count(distinct addr) num1,count(distinct status) num2 from mytable where addr='gz';
uncache table mytable
使用数据示例
张三#广州#学生
李四#贵州#教师
王五#武汉#讲师
赵六#成都#学生
lisa#广州#学生
lily#gz#studene
Standalone Spark SQL Shell
Spark SQL also supports a simple shell you can use as a single process: spark-sql
它主要用于本地的开发环境,在共享集群环境中,请使用JDBC SERVER
cd /opt/cloudera/parcels/CDH/lib/spark/bin
./spark-sql
如何使用Spark SQL 的JDBC server
简介
Spark SQL provides JDBC connectivity, which is useful for connecting business intelligence (BI) tools to a Spark cluster and for sharing a cluster across multipleusers. The JDBC server runs as a standalone Spark driver program that can be shared by multiple clients. Any client can cache tables in memory, query them, and so on and the cluster resources and cached data will be shared among all of them.
Spark SQL’s JDBC server corresponds to the HiveServer2 in Hive. It is also known as the “Thrift server” since it uses the Thrift communication protocol. Note that the JDBC server requires Spark be built with Hive support
运行环境
集群环境:CDH5.3.0
具体JAR版本如下:
spark版本:1.2.0-cdh5.3.0
hive版本:0.13.1-cdh5.3.0
hadoop版本:2.5.0-cdh5.3.0
启动 JDBC server
cd /etc/spark/conf
ln -s /etc/hive/conf/hive-site.xml hive-site.xml
cd /opt/cloudera/parcels/CDH/lib/spark/
chmod- -R 777 logs/
cd /opt/cloudera/parcels/CDH/lib/spark/sbin
./start-thriftserver.sh --master yarn --hiveconf hive.server2.thrift.port=10008
Connecting to the JDBC server with Beeline
cd /opt/cloudera/parcels/CDH/lib/spark/bin
beeline -u jdbc:hive2://hadoop04:10000
[root@hadoop04 bin]# beeline -u jdbc:hive2://hadoop04:10000
scan complete in 2ms
Connecting to jdbc:hive2://hadoop04:10000
Connected to: Spark SQL (version 1.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.3.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.3.0 by Apache Hive
0: jdbc:hive2://hadoop04:10000>
Working with Beeline
Within the Beeline client, you can use standard HiveQL commands to create, list, and query tables. You can find the full details of HiveQL in the Hive Language Manual,but here, we show a few common operations.
CREATE TABLE IF NOT EXISTS mytable (key INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
create table mytable(name string,addr string,status string) row format delimited fields terminated by '#'
#加载本地文件
load data local inpath '/external/tmp/data.txt' into table mytable
#加载hdfs文件
load data inpath 'hdfs://ju51nn/external/tmp/data.txt' into table mytable;
describe mytable;
explain select * from mytable where name = '张三'
select * from mytable where name = '张三'
cache table mytable
select count(*) total,count(distinct addr) num1,count(distinct status) num2 from mytable where addr='gz';
uncache table mytable
使用数据示例
张三#广州#学生
李四#贵州#教师
王五#武汉#讲师
赵六#成都#学生
lisa#广州#学生
lily#gz#studene
Standalone Spark SQL Shell
Spark SQL also supports a simple shell you can use as a single process: spark-sql
它主要用于本地的开发环境,在共享集群环境中,请使用JDBC SERVER
cd /opt/cloudera/parcels/CDH/lib/spark/bin
./spark-sql
0 0
- 如何使用Spark SQL 的JDBC server
- 通过Thrift Server使用JDBC来运行Spark SQL
- Spark SQL读取Hive数据配置及使用Thrift JDBC/ODBC Server访问Spark SQL
- 使用JDBC连接SQL SERVER 2008的方法
- 使用JDBC连接SQL SERVER 2008的方法 SQL2008连接
- 使用JDBC连接SQL SERVER 2008的方法
- 使用JDBC连接SQL SERVER 2008的方法
- JavaWeb使用SQL Server驱动的JDBC(1)
- JavaWeb使用SQL Server驱动的JDBC(2)
- JavaWeb使用SQL Server驱动的JDBC(3)
- JavaWeb使用SQL Server驱动的JDBC(4)
- 如何使用JDBC+Struct2框架的poi读取Excel的数据然后插入Sql Server数据库中
- spark SQL Running the Thrift JDBC/ODBC server
- spark SQL Running the Thrift JDBC/ODBC server
- jdbc 如何与 sql server 2000连接
- Spark SQL的使用
- spark-SQL的使用
- 使用 Microsoft JDBC Driver for SQL Server 连接到SQL Server的5种方式
- 第一个node服务器
- tjut 3446
- 分布式Web服务器架构
- 整理Android项目的开源项目
- RecyclerView notifyDataSetChanged 图片重新加载问题
- 如何使用Spark SQL 的JDBC server
- PHP实现双向链表、栈
- python基础教程学习笔记二
- 判断字符串数组能否首尾相连
- QT 连接 sql server数据库 完整演示
- 解决git error Key has already been taken
- 微信扫码支付---模式一(PC端,解决中文乱码)
- eclipse-maven更新代码后找不到类,报错
- httpCilent请求网络 及 StreamUtils 工具类转换