HIVE 系列 (4) hive 内部表和外部表HQL查询统计

来源:互联网 发布:知豆汽车销量 编辑:程序博客网 时间:2024/06/06 02:39

 内部表:放在hive数据库中,drop表,里面的数据文件也删除

外部表,放在hdfs的文件中,drop表,仅删除hive的元数据,hdfs的数据文件还在

 

源数据

20111230000005 57375476989eea12893c0c3811607bcf 奇艺高清 1 1 http://www.qiyi.com/
20111230000005 66c5bb7774e31d0a22278249b26bc83a 凡人修仙传 3 1 http://www.booksky.org/BookDetail.aspx?BookID=1050804&Level=1
20111230000007 b97920521c78de70ac38e3713f524b50 本本联盟 1 1 http://www.bblianmeng.com/
20111230000008 6961d0c97fe93701fc9c0d861d096cd9 华南师范大学图书馆 1 1 http://lib.scnu.edu.cn/
20111230000008 f2f5a21c764aebde1e8afcc2871e086f 在线代理 2 1 http://proxyie.cn/
20111230000009 96994a0480e7e1edcaef67b20d8816b7 伟大导演 1 1 http://movie.douban.com/review/1128960/
20111230000009 698956eb07815439fe5f46e9a4503997 youku 1 1 http://www.youku.com/
20111230000009 599cd26984f72ee68b2b6ebefccf6aed 安徽合肥365房产网 1 1 http://hf.house365.com/
20111230000010 f577230df7b6c532837cd16ab731f874 哈萨克网址大全 1 1 http://www.kz321.com/
20111230000010 285f88780dd0659f5fc8acc7cc4949f2 IQ数码 1 1 http://www.iqshuma.com/
20111230000010 f4ba3f337efb1cc469fcd0b34feff9fb 推荐待机时间长的手机 1 1 http://mobile.zol.com.cn/148/1487938.html
20111230000010 3d1acc7235374d531de1ca885df5e711 满江红 1 1 http://baike.baidu.com/view/6500.htm
20111230000010 dbce4101683913365648eba6a85b6273 光标下载 1 1 http://zhidao.baidu.com/question/38626533

 

 

1\[root@master HiveTBdata]# service mysql start
Starting MySQL SUCCESS!
 

2\[root@master HiveTBdata]# service mysql start
Starting

MySQL................................................................................................

.... ERROR! The server quit without updating PID file (/var/lib/mysql/master.pid).
[root@master HiveTBdata]# ps -ef|grep mysqld
root      1964     1  0 07:04 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql

--pid-file=/var/lib/mysql/master.pid
mysql     2057  1964  0 07:04 ?        00:00:04 /usr/sbin/mysqld --basedir=/usr --

datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-

error=/var/lib/mysql/master.err --pid-file=/var/lib/mysql/master.pid
root      3750  2368  0 08:03 pts/0    00:00:00 grep mysqld
[root@master HiveTBdata]# kill -9 2057
[root@master HiveTBdata]# kill -9 2057
-bash: kill: (2057) - No such process
[root@master HiveTBdata]# ps -ef|grep mysqld
root      3754  2368  0 08:03 pts/0    00:00:00 grep mysqld
[root@master HiveTBdata]# service mysql start
Starting MySQL..... SUCCESS!
[root@master HiveTBdata]# mysql -uhadoop -phadoop
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.31 MySQL Community Server (GPL)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>


3\[root@master HiveTBdata]# hive

Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.1/lib/hive-common-

1.2.1.jar!/hive-log4j.properties
Java HotSpot(TM) Client VM warning: You have loaded library /tmp/libnetty-transport-native-

epoll6516087966858087671.so which might have disabled stack guard. The VM will try to fix the stack

guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z

noexecstack'.
hive>


hive> create database hive;
OK
Time taken: 0.441 seconds
hive> show databases;
OK
default
hive
Time taken: 0.03 seconds, Fetched: 2 row(s)
hive>

hive> create table SogouQ2(ID STRING,WEBSESSION STRING,WORD STRING,S_SEQ INT,C_SEQ INT,WEBSITE

STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\T' LINES TERMINATED BY '\n';


/user/hive/warehouse/hive.db/sogouq2

加载数据


LOAD DATA INPATH '/library/HiveSogou/SogouQ2.txt' INTO TABLE SogouQ2;
hive> LOAD DATA INPATH '/library/HiveSogou/SogouQ2.txt' INTO TABLE SogouQ2;
Loading data to table hive.sogouq2
Table hive.sogouq2 stats: [numFiles=1, totalSize=217441417]
OK
Time taken: 1.211 seconds
hive> 

 

 

查询
select count(*) from SogouQ2;
hive> select count(*) from SogouQ2;
Query ID = root_20160304083450_18e62b4b-bc87-45fd-bddf-d3d496fa1511
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 08:34:52,005 Stage-1 map = 0%,  reduce = 0%
2016-03-04 08:34:56,044 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1005283927_0003
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 869782066 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2000000
Time taken: 6.028 seconds, Fetched: 1 row(s)


select count(*) from SOGOUQ2 where WEBSITE like '%baidu%';
 
hive> select count(*) from SogouQ2 where WEBSITE like '%baidu%';
Query ID = root_20160304084529_8e9809cd-b86c-4a5d-85f2-a9338c03af31
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 08:45:31,264 Stage-1 map = 0%,  reduce = 0%
2016-03-04 08:45:36,278 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1004743092_0006
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 2174455144 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken: 6.546 seconds, Fetched: 1 row(s)
hive>

select count(*) from SOGOUQ2 where  S_SEQ =1 and C_SEQ =1 and WEBSITE like '%baidu%';


hive> select count(*) from SOGOUQ2 where  S_SEQ =1 and C_SEQ =1 and WEBSITE like '%baidu%';
Query ID = root_20160304084931_135abca3-7a3b-4829-91c0-e69a2f3a142c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 08:49:33,117 Stage-1 map = 0%,  reduce = 0%
2016-03-04 08:49:38,134 Stage-1 map = 41%,  reduce = 0%
2016-03-04 08:49:39,147 Stage-1 map = 100%,  reduce = 0%
2016-03-04 08:49:40,151 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local562971879_0010
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 3914019248 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
0
Time taken: 8.564 seconds, Fetched: 1 row(s)
hive>
外部表:
hive> create external table SOGOUQ1 (ID STRING,WEBSESSION STRING,WORD STRING,S_SEQ INT,C_SEQ

INT,WEBSITE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS

TEXTFILE LOCATION '/library/HiveSogou/SogouQ1External';
OK
Time taken: 0.435 seconds
hive>

加载数据 hadoop dfs -put SogouQ1.txt   /library/HiveSogou/SogouQ1External

[root@master IMFdatatest]#cd HiveSogou
[root@master HiveSogou]#ls
SogouQ1.txt  SogouQ2.txt
[root@master HiveSogou]#hadoop dfs -put SogouQ1.txt   /library/HiveSogou/SogouQ1External
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
16/03/04 09:08:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your

platform... using builtin-java classes where applicable
[root@master HiveSogou]#


hive> select fcount(*) from SOGOUQ1;
Query ID = root_20160304091013_83043772-6a81-4240-b15b-b92301674523
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 09:10:22,626 Stage-1 map = 0%,  reduce = 0%
2016-03-04 09:10:51,980 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1673398869_0013
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 5001302448 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
1000000
Time taken: 38.134 seconds, Fetched: 1 row(s)
hive>

 hive> select count(*) fromsogouq1 where S_SEQ=1 and C_SEQ =1;
FAILED: SemanticException [Error 10004]: Line 1:34 Invalid table alias or column reference 'S_SEQ': (possible column names are: )
hive> select count(*) from sogouq1 where S_SEQ=1 and C_SEQ =1;
Query ID = root_20160304091447_586284d5-31a6-4b17-b8b0-19643508c95a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 09:14:49,995 Stage-1 map = 0%,  reduce = 0%
2016-03-04 09:14:53,004 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1208104712_0014
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 5218803596 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
388101
Time taken: 5.293 seconds, Fetched: 1 row(s)
hive>

 
hive> select word,count(word) as countword from SogouQ1 group by word order by countword  desc limit 5
    > ;
Query ID = root_20160304091906_9d82d19a-9b2a-4a75-bfbd-1b48004a3731
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 09:19:09,756 Stage-1 map = 0%,  reduce = 0%
2016-03-04 09:19:15,772 Stage-1 map = 67%,  reduce = 0%
2016-03-04 09:19:17,785 Stage-1 map = 100%,  reduce = 0%
2016-03-04 09:19:24,058 Stage-1 map = 100%,  reduce = 90%
2016-03-04 09:19:27,185 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local10598871_0015
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-03-04 09:19:29,031 Stage-2 map = 0%,  reduce = 0%
2016-03-04 09:19:33,057 Stage-2 map = 100%,  reduce = 0%
2016-03-04 09:19:37,151 Stage-2 map = 100%,  reduce = 100%
Ended Job = job_local524290706_0016
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 5436304744 HDFS Write: 0 SUCCESS
Stage-Stage-2:  HDFS Read: 5436304744 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
锟劫讹拷        7564
baidu   3652
锟斤拷锟斤拷锟斤拷锟斤拷        2786
锟斤拷锟斤拷锟斤拷锟截筹拷锟斤拷锟斤拷锟侥革拷锟斤拷    2388
4399小锟斤拷戏  2119
Time taken: 30.321 seconds, Fetched: 5 row(s)
hive>

 

0 0