Hive安装配置+练习
来源:互联网 发布:linux 系统启动脚本 编辑:程序博客网 时间:2024/05/22 13:40
目录
一. Hive的安装
hive配置步骤
- step1
- setp2
- step3
- step4
- step5
启动hive
- 启动metastore服务
- 启动hiveserver服务
- 启动hive客户端
- 查询MySQL数据库中的元数据
二. Hive的基本使用
1. 进入hive控制台 2. 新建表 创建新表 导入数据t_hive.txt到t_hive表 3. 查看表和数据 查看表 正则匹配表名 查看表数据 查看表结构 4. 修改表 -增加一个字段 -重命令表名 5. 删除表
三. Hive交互式模式
四. 数据导入
1. 从操作本地文件系统加载数据(LOCAL) -在HDFS中查找刚刚导入的数据 -从HDFS加载数据
五. 数据导出
1. 从HDFS复制到HDFS其他位置 2. 通过Hive导出到本地文件系统 3. 查看本地操作系统
六. Hive查询HiveQL
1. 普通查询:排序,列别名,嵌套子查询 2. 连接查询:JOIN 3. 聚合查询1:count, avg 4. 聚合查询2:count, distinct 5. 聚合查询3:GROUP BY, HAVING
七. Hive视图
-删除视图
八. Hive分区表
1. 创建数据 2. 创建数据表 3. 创建分区数据表 4. 导入数据 5. 查看分区表 6. 查询数据
前言
Hive是Hadoop一个程序接口,Hive让数据分析人员快速上手,Hive使用了类SQL的语法,Hive让JAVA的世界变得简单而轻巧,Hive让Hadoop普及到了程序员以外的人。
从Hive开始,让分析师们也能玩转大数据。
一. Hive的安装
系统环境
装好hadoop的环境后,我们可以把Hive装在namenode机器上(c1)。
hadoop的环境,请参考:让Hadoop跑在云端系列文章,RHadoop实践系列之一:Hadoop环境搭建
下载: apache-hive-1.2.2-bin.tar.gz
解压到:/home/cos/apache-hive-1.2.2-bin.tar.gz(解压到哪一个目录自己定义)
hive配置1、step1
(1)、解压安装包
tar -zxvf apache-hive-1.2.2-bin.tar.gz
(2)、添加环境变量
[user@user ~]$ vim .bashrc
导入下面的环境变量
export HIVE_HOME=自己hive的解压路径 export PATH=$PATH:$HIVE_HOME/bin
在安装之前,需要先安装MySql,Hive默认使用内嵌的DerBy作为元数据库,但是鉴于其单会话限制,这里使用Mysql作为元数据库
检查是否装有Mysql:
netstat -tap | grep mysql
如果有输出,说明已经有mysql,如果没有需要安装
2、setp2
安装Mysql,装完Mysql的跳过这一步
[user@user home]# wget http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm
[user@user home]#rpm -ivh mysql-community-release-el7-5.noarch.rpm
[user@user home]# yum install mysql-community-server已加载插件:fastestmirror, langpacks
base | 3.6 kB 00:00
extras | 3.4 kB 00:00
mysql-connectors-community | 2.5 kB 00:00
mysql-tools-community | 2.5 kB 00:00
mysql56-community | 2.5 kB 00:00
updates | 3.4 kB 00:00
(1/4): mysql-connectors-community/x86_64/primary_db | 14 kB 00:03
(2/4): mysql-tools-community/x86_64/primary_db | 33 kB 00:03
(3/4): mysql56-community/x86_64/primary_db | 168 kB 00:04
(4/4): updates/7/x86_64/primary_db | 4.8 MB 00:14
Loading mirror speeds from cached hostfile
* base: mirrors.aliyun.com
* extras: mirrors.aliyun.com
* updates: mirrors.aliyun.com
正在解决依赖关系
–> 正在检查事务
……
……
验证中 : mysql-community-libs-5.6.36-2.el7.x86_64 11/11已安装:
mysql-community-server.x86_64 0:5.6.36-2.el7作为依赖被安装:
mysql-community-client.x86_64 0:5.6.36-2.el7
mysql-community-common.x86_64 0:5.6.36-2.el7
mysql-community-libs.x86_64 0:5.6.36-2.el7
perl-Compress-Raw-Bzip2.x86_64 0:2.061-3.el7
perl-Compress-Raw-Zlib.x86_64 1:2.061-4.el7
perl-DBI.x86_64 0:1.627-4.el7
perl-Data-Dumper.x86_64 0:2.145-3.el7
perl-IO-Compress.noarch 0:2.061-2.el7
perl-Net-Daemon.noarch 0:0.48-5.el7
perl-PlRPC.noarch 0:0.2020-14.el7
完毕!
3、step3
启动mysql服务
[user@localhost ~]$ service mysqld start[user@localhost ~]$ mysql -u root
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.6.36 MySQL Community Server (GPL)Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -ADatabase changed
//添加root用户的密码
mysql> update user set password = password(‘123456’) where >user=’root’;Query OK, 4 rows affected (0.00 sec)
Rows matched: 4 Changed: 4 Warnings: 0mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)mysql> exit
Bye[user@localhost ~]$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.6.36 MySQL Community Server (GPL)Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.
创建hive用户,数据库等
mysql> use mysql;
mysql> GRANT USAGE ON . TO ‘hive’@’localhost’ IDENTIFIED BY ‘hive’ WITH GRANT OPTION;
mysql> grant all on hive.* to hive@’%’ identified by ‘hive’;
mysql> grant all on hive.* to hive@’localhost’ identified by ‘hive’;
mysql> create database hive;
mysql> flush privileges;
mysql> ALTER DATABASE hive CHARACTER SET latin1;
4、step4
配置hive中的文件
[user@localhost conf]$ sudo cp hive-default.xml.template hive-site.xml[sudo] password for user:[user@localhost conf]$ lsbeeline-log4j.properties.template hive-log4j.properties.templatehive-default.xml.template hive-site.xmlhive-env.sh.template ivysettings.xmlhive-exec-log4j.properties.template[user@localhost conf]$ sudo vim hive-site.xml[sudo] password for user:
主要修改以下参数
<property> <name>javax.jdo.option.ConnectionURL </name> <value>jdbc:mysql://localhost:3306/hive </value> //hive 为自己新建数据库的名字</property> <property> <name>javax.jdo.option.ConnectionDriverName </name> <value>com.mysql.jdbc.Driver </value> //驱动</property><property> <name>javax.jdo.option.ConnectionPassword </name> <value>hive </value> //数据库密码</property> <property> <name>datanucleus.autoCreateSchema </name> <value>true</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> //数据库用户名 <description>Username to use against metastore database</description> </property> <property> <name>hive.exec.local.scratchdir</name> <value>/home/hdpsrc/hive/iotmp</value>// 自定义目录 <description>Local scratch space for Hive jobs</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/home/hdpsrc/hive/iotmp</value> //自定义目录 <description>Temporary local directory for added resources in the remote file system.</description> </property> <property> <name>hive.querylog.location</name> <value>/home/hdpsrc/hive/iotmp</value> 自定义目录 <description>Location of Hive run time structured log file</description> </property><property> <name>hive.server2.logging.operation.log.location</name> <value>自定义目录/operation_logs</value>//注意operation_logs文件夹也需要自己新建 <description>Top level directory where operation logs are stored if logging functionality is enabled</description></property>[user@localhost conf]$ sudo cp hive-log4j.properties.template hive-log4j.properties[sudo] password for user:[user@localhost conf]$ sudo vim hive-log4j.properties#log4j.appender.EventCounter=org.apache.hadoop.hive.shims.HiveEventCounterlog4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
配置完后以上文件后,把mysql-connector的jar包放到hive的lib目录下,这一步不能少
5、step5
启动hadoop
[user@localhost hadoop-2.7.3]$ ./sbin/start-all.shThis script is Deprecated. Instead use start-dfs.sh and start-yarn.shStarting namenodes on [localhost]localhost: namenode running as process 3931. Stop it first.localhost: datanode running as process 4055. Stop it first.Starting secondary namenodes [0.0.0.0]0.0.0.0: secondarynamenode running as process 4255. Stop it first.starting yarn daemonsresourcemanager running as process 4451. Stop it first.localhost: nodemanager running as process 4566. Stop it first.[user@localhost hadoop-2.7.3]$ jps4451 ResourceManager4566 NodeManager4055 DataNode3931 NameNode10061 Jps4255 SecondaryNameNode[user@localhost apache-hive-1.2.2-bin]$ hivehive> show databases;OKdefaultTime taken: 1.926 seconds, Fetched: 1 row(s)
可能出现的问题:
1. 在安装mysql数据库是出现:/var/run/yum.pid已被锁定,PID为1610的另一个程序正在运行。
解决办法 rm -f /var/run/yum.pid
删除文件后再次运行yum可用。
2. hadoop起来后,输入hive,报RuntimeException的异常。
尝试换个其他版本的mysql-contentor jar包,jar包从windows系统直接拖进虚拟机可以会出现问题,建议从虚拟机里下jar包
接下来,在hdfs上面,创建目录
$HADOOP_HOME /bin/hadoop fs -mkdir /tmp$HADOOP_HOME /bin/hadoop fs -mkdir /user/warehouse//如果报错,输入hadoop fs -mkdir -p /user/warehouse$HADOOP_HOME /bin/hadoop fs -chmod g+w /tmp //赋予权限$HADOOP_HOME /bin/hadoop fs -chmod g+w /user/warehouse//赋予权限
启动hive
启动metastore服务
~ bin/hive –service metastore &
输出信息:Starting Hive Metastore Server…
启动hiveserver服务
~ bin/hive –service hiveserver2 &
输出信息:Starting Hive Thrift Server…启动hive客户端
~ bin/hive shell
hive> show tablesOK查询MySQL数据库中的元数据~ mysql –u root -pmysql> use hive;Database changedmysql> show tables;+-------------------------+| Tables_in_hive |+-------------------------+| BUCKETING_COLS || CDS || COLUMNS_V2 || DATABASE_PARAMS || DBS || IDXS || INDEX_PARAMS || PARTITIONS || PARTITION_KEYS || PARTITION_KEY_VALS || PARTITION_PARAMS || PART_COL_PRIVS || PART_PRIVS || SDS || SD_PARAMS || SEQUENCE_TABLE || SERDES || SERDE_PARAMS || SORT_COLS || TABLE_PARAMS || TBLS || TBL_COL_PRIVS || TBL_PRIVS |+-------------------------+23 rows in set (0.00 sec)
Hive已经成功安装,下面是hive的使用攻略。
二. Hive的基本使用
- 进入hive控制台
~ cd cd /home/apache-hive-1.2.2-bin/~ bin/hive shell
输出:Logging initialized using configuration in。。。
2.新建表
创建数据(文本以tab分隔)
sudo vim /home/hive/t_hive.txt //新建t_hive.txt文件,位置自己定
输入数据,数字之间距离为tap间隔(直接粘贴,上传到hive上可能会出现问题)
16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34
创建新表
hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';OKTime taken: 0.489 seconds>导入数据t_hive.txt到t_hive表hive> LOAD DATA LOCAL INPATH '/home/hive/t_hive.txt' OVERWRITE INTO TABLE t_hive ;>Loading data to table default.t_hive>OKTime taken: 0.397 seconds>**3.查看表和数据** **查看表** hive> show tables;OKt_hiveTime taken: 0.099 seconds>**正则匹配表名**hive>show tables '*t*';OKt_hiveTime taken: 0.065 seconds>**查看表数据**
hive> select * from t_hive;OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34
Time taken: 0.264 seconds
查看表结构
hive> desc t_hive;OKa intb intc int
Time taken: 0.1 seconds
4.修改表
增加一个字段
hive> ALTER TABLE t_hive ADD COLUMNS (new_col String);OKTime taken: 0.186 secondshive> desc t_hive;OKa intb intc intnew_col stringTime taken: 0.086 seconds
重命令表名
~ ALTER TABLE t_hive RENAME TO t_hadoop;OKTime taken: 0.45 secondshive> show tables;OKt_hadoopTime taken: 0.07 seconds
5.删除表
hive> DROP TABLE t_hadoop;OKTime taken: 0.767 secondshive> show tables;OKTime taken: 0.064 seconds
三. Hive交互式模式
• quit,exit: 退出交互式shell• reset: 重置配置为默认值• set <key>=<value> : 修改特定变量的值(如果变量名拼写错误,不会报错)• set : 输出用户覆盖的hive配置变量• set -v : 输出所有Hadoop和Hive的配置变量• add FILE[S] *, add JAR[S] *, add ARCHIVE[S] * : 添加 一个或多个 file, jar, archives到分布式缓存• list FILE[S], list JAR[S], list ARCHIVE[S] : 输出已经添加到分布式缓存的资源。• list FILE[S] *, list JAR[S] *,list ARCHIVE[S] * : 检查给定的资源是否添加到分布式缓存• delete FILE[S] *,delete JAR[S] *,delete ARCHIVE[S] * : 从分布式缓存删除指定的资源• ! <command> : 从Hive shell执行一个shell命令• dfs <dfs command> : 从Hive shell执行一个dfs命令• <query string> : 执行一个Hive 查询,然后输出结果到标准输出• source FILE <filepath>: 在CLI里执行一个hive脚本文件
四. 数据导入
还以刚才的t_hive为例。
创建表结构
hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’;
1.从操作本地文件系统加载数据(LOCAL)
hive> LOAD DATA LOCAL INPATH '/home/hive/t_hive.txt' OVERWRITE INTO TABLE t_hive ;Loading data to table default.t_hiveOKTime taken: 0.612 seconds
在HDFS中查找刚刚导入的数据
[user@localhost hadoop-2.7.3]cdbin/[user@localhostbin] hadoop fs -cat /user/warehouse/t_hive/t_hive.txt
16 2 3
61 12 13
41 2 31
17 21 3
71 2 31
1 12 34
11 2 34
从HDFS加载数据
创建表t_hive2
hive> CREATE TABLE t_hive2 (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’;从HDFS加载数据
hive> LOAD DATA INPATH ‘/user/warehouse/t_hive/t_hive.txt’ OVERWRITE INTO TABLE t_hive2;
Loading data to table default.t_hive2
Table default.t_hive2 stats: [numFiles=1, numRows=0, totalSize=68, rawDataSize=0]
OK
Time taken: 2.649 seconds
hive>
查看数据
hive> select * from t_hive2;OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.287 seconds
从其他表导入数据
hive> INSERT OVERWRITE TABLE t_hive2 SELECT * FROM t_hive ;Total MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0002, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0002Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0002Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02013-07-16 10:32:41,979 Stage-1 map = 0%, reduce = 0%2013-07-16 10:32:48,034 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:49,050 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:50,068 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:51,082 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:52,093 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:53,102 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:54,112 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.03 secMapReduce Total cumulative CPU time: 1 seconds 30 msecEnded Job = job_201307131407_0002Ended Job = -314818888, job is filtered out (removed at runtime).Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-32-31_323_5732404975764014154/-ext-10000Loading data to table default.t_hive2Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2Table default.t_hive2 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]7 Rows loaded to t_hive2MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 1.03 sec HDFS Read: 273 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 30 msecOKTime taken: 23.227 seconds
hive> select * from t_hive2;OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.134 seconds
创建表并从其他表导入数据
删除表
hive> DROP TABLE t_hive;
创建表并从其他表导入数据
hive> CREATE TABLE t_hive AS SELECT * FROM t_hive2 ;Total MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0003, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0003Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0003Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02013-07-16 10:36:48,612 Stage-1 map = 0%, reduce = 0%2013-07-16 10:36:54,648 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:55,657 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:56,666 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:57,673 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:58,683 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:59,691 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.13 secMapReduce Total cumulative CPU time: 1 seconds 130 msecEnded Job = job_201307131407_0003Ended Job = -670956236, job is filtered out (removed at runtime).Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10001Moving data to: hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveTable default.t_hive stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]7 Rows loaded to hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10000MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 1.13 sec HDFS Read: 272 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 130 msecOKTime taken: 20.13 secondshive> select * from t_hive;OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.109 seconds
仅复制表结构不导数据
hive> CREATE TABLE t_hive3 LIKE t_hive;hive> select * from t_hive3;OKTime taken: 0.077 seconds
从MySQL数据库导入数据
我们将在介绍Sqoop时讲。
五. 数据导出
从HDFS复制到HDFS其他位置
~ hadoop fs -cp /user/warehouse/t_hive/~ hadoop fs -ls /t_hiveFound 1 items-rw-r--r-- 1 cos supergroup 56 2013-07-16 10:41 /t_hive/000000_0~ hadoop fs -cat /t_hive/000000_016236112134123117213712311123411234
通过Hive导出到本地文件系统
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/t_hive' SELECT * FROM t_hive;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0005, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0005Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0005Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02013-07-16 10:46:24,774 Stage-1 map = 0%, reduce = 0%2013-07-16 10:46:30,823 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:31,833 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:32,844 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:33,856 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:34,865 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:35,873 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:36,884 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.87 secMapReduce Total cumulative CPU time: 870 msecEnded Job = job_201307131407_0005Copying data to local directory /tmp/t_hiveCopying data to local directory /tmp/t_hive7 Rows loaded to /tmp/t_hiveMapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 0.87 sec HDFS Read: 271 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 870 msecOKTime taken: 23.369 seconds
查看本地操作系统
hive> ! cat /tmp/t_hive/000000_0;
hive> 1623
611213
41231
17213
71231
11234
11234
六. Hive查询HiveQL
注:以下代码将去掉map,reduce的日志输出部分。
普通查询:排序,列别名,嵌套子查询
hive> FROM (
SELECT b,c as c2 FROM t_hive
) t
SELECT t.b, t.c2
WHERE b>2
LIMIT 2;
12 13
21 3
连接查询:JOIN
hive> SELECT t1.a,t1.b,t2.a,t2.b
FROM t_hive t1 JOIN t_hive2 t2 on t1.a=t2.a
WHERE t1.c>10;
1 12 1 12
11 2 11 2
41 2 41 2
61 12 61 12
71 2 71 2
聚合查询1:count, avg
hive> SELECT count(*), avg(a) FROM t_hive;
7 31.142857142857142
聚合查询2:count, distinct
hive> SELECT count(DISTINCT b) FROM t_hive;
3
聚合查询3:GROUP BY, HAVING
GROUP BY
hive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c
16.0 2 3
56.0 2 62
11.0 2 34
61.0 12 13
1.0 12 34
17.0 21 3
HAVING
hive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c HAVING sum(c)>30
56.0 2 62
11.0 2 34
1.0 12 34
七. Hive视图
Hive视图和数据库视图的概念是一样的,我们还以t_hive为例。
hive> CREATE VIEW v_hive AS SELECT a,b FROM t_hive where c>30;
hive> select * from v_hive;
41 2
71 2
1 12
11 2
删除视图
hive> DROP VIEW IF EXISTS v_hive;
OK
Time taken: 0.495 seconds
八. Hive分区表
分区表是数据库的基本概念,但很多时候数据量不大,我们完全用不到分区表。Hive是一种OLAP数据仓库软件,涉及的数据量是非常大的,所以分区表在这个场景就显得非常重要!!
下面我们重新定义一个数据表结构:t_hft
创建数据
~ sudo vim /home/cos/demo/t_hft_20130627.csv
000001,092023,9.76
000002,091947,8.99
000004,092002,9.79
000005,091514,2.2
000001,092008,9.70
000001,092059,9.45
~ vi /home/cos/demo/t_hft_20130628.csv
000001,092023,9.76
000002,091947,8.99
000004,092002,9.79
000005,091514,2.2
000001,092008,9.70
000001,092059,9.45
创建数据表
DROP TABLE IF EXISTS t_hft;
CREATE TABLE t_hft(
SecurityID STRING,
tradeTime STRING,
PreClosePx DOUBLE
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
创建分区数据表
根据业务:按天和股票ID进行分区设计
DROP TABLE IF EXISTS t_hft;
CREATE TABLE t_hft(
SecurityID STRING,
tradeTime STRING,
PreClosePx DOUBLE
) PARTITIONED BY (tradeDate INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
导入数据
20130627
hive> LOAD DATA LOCAL INPATH ‘/home/cos/demo/t_hft_20130627.csv’ OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130627);
Copying data from file:/home/cos/demo/t_hft_20130627.csv
Copying file: file:/home/cos/demo/t_hft_20130627.csv
Loading data to table default.t_hft partition (tradedate=20130627)
20130628
hive> LOAD DATA LOCAL INPATH ‘/home/cos/demo/t_hft_20130628.csv’ OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130628);
Copying data from file:/home/cos/demo/t_hft_20130628.csv
Copying file: file:/home/cos/demo/t_hft_20130628.csv
Loading data to table default.t_hft partition (tradedate=20130628)
查看分区表
hive> SHOW PARTITIONS t_hft;
tradedate=20130627
tradedate=20130628
Time taken: 0.082 seconds
查询数据
hive> select * from t_hft where securityid=’000001’;
000001 092023 9.76 20130627
000001 092008 9.7 20130627
000001 092059 9.45 20130627
000001 092023 9.76 20130628
000001 092008 9.7 20130628
000001 092059 9.45 20130628
hive> select * from t_hft where tradedate=20130627 and PreClosePx<9;
000002 091947 8.99 20130627
000005 091514 2.2 20130627
Hive基于使用完成,这些都是日常的操作。后面我会继续讲一下,HiveQL优化及Hive的运维。
可能遇到的问题
1. hive –service hiveserver &
Exception in thread “main” java.lang.ClassNotFoundException:org.apache.hadoop.hive.service.HiveServer
- 这是原来表结构
hive> desc t_hive;
OK
a int
b int
c int
Time taken: 3.201 seconds, Fetched: 3 row(s)
想新增加一个字段,报错
hive> ALTER TABLE t_hive ADD COLUMNS (new_col String);
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table.
For direct MetaStore DB connections, we don’t support retries at the client level.
hive.log 报错内容
2017-05-19 14:53:41,036 ERROR [main]: ql.Driver (SessionState.java:printError(960)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. For direct MetaStore DB connections, we don’t support retries at the client level.
2017-05-19 14:53:41,036 INFO [main]: metadata.Hive (Hive.java:logDumpPhase(3294)) - Dumping metastore api call timing information for : execution phase
2017-05-19 14:53:41,037 INFO [main]: metadata.Hive (Hive.java:dumpAndClearMetaCallTiming(3285)) - Total time spent in this metastore function was greater than 1000ms : getTable_(String, String, )=2108
2017-05-19 14:53:41,037 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) -
2017-05-19 14:53:41,037 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) -
2017-05-19 14:53:41,037 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) -
2017-05-19 14:53:41,037 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) -
2017-05-19 14:53:41,037 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) -
解决方法:重新换一个版本的mysql驱动,替换掉hive的lib目录中之前的版本,我换成了5.1.41版本的,可以用。
- Hive安装配置+练习
- hive安装与配置
- hive安装配置
- Hive安装配置详细
- pig hive 安装配置
- Hadoop&hive安装配置
- Hive安装配置
- Hive安装配置
- hive 安装与配置
- hive安装及配置
- Hive 安装及配置
- hive集群安装配置
- Hive安装配置笔记
- hive 安装配置
- hive安装配置
- hadoop-hive安装配置
- Hive安装及配置
- Hive安装与配置
- Springmvc导出excel表
- 推送本地项目至git仓库
- IOI 1994 USACO 1.5 数字三角形 Number Triangles
- 数据结构之查找
- Hive学习指南(二)
- Hive安装配置+练习
- 微信小程序开发 一 tabbar图标和颜色
- 深度优先算法解决POJ1830开关问题
- 微服务,微架构[七]之springboot异常处理
- HDU 1879 继续畅通工程
- 0016、node 之富文本的使用
- CIFAR-10数据集可视化Python版本
- Spring MVC入门
- HDU 4714