HIVE 系列 (4) hive 内部表和外部表HQL查询统计
来源:互联网 发布:知豆汽车销量 编辑:程序博客网 时间:2024/06/06 02:39
内部表:放在hive数据库中,drop表,里面的数据文件也删除
外部表,放在hdfs的文件中,drop表,仅删除hive的元数据,hdfs的数据文件还在
源数据
20111230000005 57375476989eea12893c0c3811607bcf 奇艺高清 1 1 http://www.qiyi.com/
20111230000005 66c5bb7774e31d0a22278249b26bc83a 凡人修仙传 3 1 http://www.booksky.org/BookDetail.aspx?BookID=1050804&Level=1
20111230000007 b97920521c78de70ac38e3713f524b50 本本联盟 1 1 http://www.bblianmeng.com/
20111230000008 6961d0c97fe93701fc9c0d861d096cd9 华南师范大学图书馆 1 1 http://lib.scnu.edu.cn/
20111230000008 f2f5a21c764aebde1e8afcc2871e086f 在线代理 2 1 http://proxyie.cn/
20111230000009 96994a0480e7e1edcaef67b20d8816b7 伟大导演 1 1 http://movie.douban.com/review/1128960/
20111230000009 698956eb07815439fe5f46e9a4503997 youku 1 1 http://www.youku.com/
20111230000009 599cd26984f72ee68b2b6ebefccf6aed 安徽合肥365房产网 1 1 http://hf.house365.com/
20111230000010 f577230df7b6c532837cd16ab731f874 哈萨克网址大全 1 1 http://www.kz321.com/
20111230000010 285f88780dd0659f5fc8acc7cc4949f2 IQ数码 1 1 http://www.iqshuma.com/
20111230000010 f4ba3f337efb1cc469fcd0b34feff9fb 推荐待机时间长的手机 1 1 http://mobile.zol.com.cn/148/1487938.html
20111230000010 3d1acc7235374d531de1ca885df5e711 满江红 1 1 http://baike.baidu.com/view/6500.htm
20111230000010 dbce4101683913365648eba6a85b6273 光标下载 1 1 http://zhidao.baidu.com/question/38626533
1\[root@master HiveTBdata]# service mysql start
Starting MySQL SUCCESS!
2\[root@master HiveTBdata]# service mysql start
Starting
MySQL................................................................................................
.... ERROR! The server quit without updating PID file (/var/lib/mysql/master.pid).
[root@master HiveTBdata]# ps -ef|grep mysqld
root 1964 1 0 07:04 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql
--pid-file=/var/lib/mysql/master.pid
mysql 2057 1964 0 07:04 ? 00:00:04 /usr/sbin/mysqld --basedir=/usr --
datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-
error=/var/lib/mysql/master.err --pid-file=/var/lib/mysql/master.pid
root 3750 2368 0 08:03 pts/0 00:00:00 grep mysqld
[root@master HiveTBdata]# kill -9 2057
[root@master HiveTBdata]# kill -9 2057
-bash: kill: (2057) - No such process
[root@master HiveTBdata]# ps -ef|grep mysqld
root 3754 2368 0 08:03 pts/0 00:00:00 grep mysqld
[root@master HiveTBdata]# service mysql start
Starting MySQL..... SUCCESS!
[root@master HiveTBdata]# mysql -uhadoop -phadoop
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.31 MySQL Community Server (GPL)
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
3\[root@master HiveTBdata]# hive
Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.1/lib/hive-common-
1.2.1.jar!/hive-log4j.properties
Java HotSpot(TM) Client VM warning: You have loaded library /tmp/libnetty-transport-native-
epoll6516087966858087671.so which might have disabled stack guard. The VM will try to fix the stack
guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z
noexecstack'.
hive>
hive> create database hive;
OK
Time taken: 0.441 seconds
hive> show databases;
OK
default
hive
Time taken: 0.03 seconds, Fetched: 2 row(s)
hive>
hive> create table SogouQ2(ID STRING,WEBSESSION STRING,WORD STRING,S_SEQ INT,C_SEQ INT,WEBSITE
STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\T' LINES TERMINATED BY '\n';
/user/hive/warehouse/hive.db/sogouq2
加载数据
LOAD DATA INPATH '/library/HiveSogou/SogouQ2.txt' INTO TABLE SogouQ2;
hive> LOAD DATA INPATH '/library/HiveSogou/SogouQ2.txt' INTO TABLE SogouQ2;
Loading data to table hive.sogouq2
Table hive.sogouq2 stats: [numFiles=1, totalSize=217441417]
OK
Time taken: 1.211 seconds
hive>
查询
select count(*) from SogouQ2;
hive> select count(*) from SogouQ2;
Query ID = root_20160304083450_18e62b4b-bc87-45fd-bddf-d3d496fa1511
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 08:34:52,005 Stage-1 map = 0%, reduce = 0%
2016-03-04 08:34:56,044 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1005283927_0003
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 869782066 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2000000
Time taken: 6.028 seconds, Fetched: 1 row(s)
select count(*) from SOGOUQ2 where WEBSITE like '%baidu%';
hive> select count(*) from SogouQ2 where WEBSITE like '%baidu%';
Query ID = root_20160304084529_8e9809cd-b86c-4a5d-85f2-a9338c03af31
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 08:45:31,264 Stage-1 map = 0%, reduce = 0%
2016-03-04 08:45:36,278 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1004743092_0006
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 2174455144 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken: 6.546 seconds, Fetched: 1 row(s)
hive>
select count(*) from SOGOUQ2 where S_SEQ =1 and C_SEQ =1 and WEBSITE like '%baidu%';
hive> select count(*) from SOGOUQ2 where S_SEQ =1 and C_SEQ =1 and WEBSITE like '%baidu%';
Query ID = root_20160304084931_135abca3-7a3b-4829-91c0-e69a2f3a142c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 08:49:33,117 Stage-1 map = 0%, reduce = 0%
2016-03-04 08:49:38,134 Stage-1 map = 41%, reduce = 0%
2016-03-04 08:49:39,147 Stage-1 map = 100%, reduce = 0%
2016-03-04 08:49:40,151 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local562971879_0010
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 3914019248 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
0
Time taken: 8.564 seconds, Fetched: 1 row(s)
hive>
外部表:
hive> create external table SOGOUQ1 (ID STRING,WEBSESSION STRING,WORD STRING,S_SEQ INT,C_SEQ
INT,WEBSITE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS
TEXTFILE LOCATION '/library/HiveSogou/SogouQ1External';
OK
Time taken: 0.435 seconds
hive>
加载数据 hadoop dfs -put SogouQ1.txt /library/HiveSogou/SogouQ1External
[root@master IMFdatatest]#cd HiveSogou
[root@master HiveSogou]#ls
SogouQ1.txt SogouQ2.txt
[root@master HiveSogou]#hadoop dfs -put SogouQ1.txt /library/HiveSogou/SogouQ1External
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
16/03/04 09:08:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
[root@master HiveSogou]#
hive> select fcount(*) from SOGOUQ1;
Query ID = root_20160304091013_83043772-6a81-4240-b15b-b92301674523
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 09:10:22,626 Stage-1 map = 0%, reduce = 0%
2016-03-04 09:10:51,980 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1673398869_0013
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 5001302448 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
1000000
Time taken: 38.134 seconds, Fetched: 1 row(s)
hive>
hive> select count(*) fromsogouq1 where S_SEQ=1 and C_SEQ =1;
FAILED: SemanticException [Error 10004]: Line 1:34 Invalid table alias or column reference 'S_SEQ': (possible column names are: )
hive> select count(*) from sogouq1 where S_SEQ=1 and C_SEQ =1;
Query ID = root_20160304091447_586284d5-31a6-4b17-b8b0-19643508c95a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 09:14:49,995 Stage-1 map = 0%, reduce = 0%
2016-03-04 09:14:53,004 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1208104712_0014
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 5218803596 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
388101
Time taken: 5.293 seconds, Fetched: 1 row(s)
hive>
hive> select word,count(word) as countword from SogouQ1 group by word order by countword desc limit 5
> ;
Query ID = root_20160304091906_9d82d19a-9b2a-4a75-bfbd-1b48004a3731
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 09:19:09,756 Stage-1 map = 0%, reduce = 0%
2016-03-04 09:19:15,772 Stage-1 map = 67%, reduce = 0%
2016-03-04 09:19:17,785 Stage-1 map = 100%, reduce = 0%
2016-03-04 09:19:24,058 Stage-1 map = 100%, reduce = 90%
2016-03-04 09:19:27,185 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local10598871_0015
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 09:19:29,031 Stage-2 map = 0%, reduce = 0%
2016-03-04 09:19:33,057 Stage-2 map = 100%, reduce = 0%
2016-03-04 09:19:37,151 Stage-2 map = 100%, reduce = 100%
Ended Job = job_local524290706_0016
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 5436304744 HDFS Write: 0 SUCCESS
Stage-Stage-2: HDFS Read: 5436304744 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
锟劫讹拷 7564
baidu 3652
锟斤拷锟斤拷锟斤拷锟斤拷 2786
锟斤拷锟斤拷锟斤拷锟截筹拷锟斤拷锟斤拷锟侥革拷锟斤拷 2388
4399小锟斤拷戏 2119
Time taken: 30.321 seconds, Fetched: 5 row(s)
hive>
- HIVE 系列 (4) hive 内部表和外部表HQL查询统计
- hive外部表和内部表
- hive外部表和内部表
- Hive内部表和外部表
- hive中内部表和外部表
- HIVE的内部表和外部表
- hive内部表和外部表
- hive 内部表 和 外部表
- hive内部表和外部表
- Hive内部表、外部表
- hive内部表、外部表
- hive 内部表和外部表的区别和理解
- hive 内部表和外部表的区别和理解
- hive 内部表和外部表的区别和理解
- hive 内部表和外部表的区别和理解
- hive内外部表查询
- Hive:向内部表和外部表加载数据
- Hive总结(三)内部表和外部表的区别
- 简单的时间片轮转多道程序内核代码分析
- [POJ] 1201 Intervals [差分约束]
- [小白见解]关于iOS小白开发中的字典转模型数据
- 四种传递参数的方法
- php、mysql导出excel数据
- HIVE 系列 (4) hive 内部表和外部表HQL查询统计
- 自定义View时,用到Paint Canvas的一些温故,自定义Loading控件(“六边形”的旋转跳跃)
- [Accessibility] Missing contentDescription attribute on image
- CSS 一些记录
- Linux内核分析(二)
- 顶_工作笔记_4
- bzoj2741【FOTILE模拟赛】L
- github自己收藏的Android开源项目(不断更新)
- centos6.4 change to 163 yum repo