hiveQL分区表

来源:互联网 发布:js遍历dom树 编辑:程序博客网 时间:2024/04/25 23:31

1、分区表的创建

hive> create table part (str string)

    > partitioned by (data string);
OK
Time taken: 0.158 seconds

2、加载数据时显示指定分区值
hive> load data local inpath 'file1'
    > into table part
    > partition (data='2013-11-7');
Copying data from file:/home/grid/hive-0.11.0/file1
Copying file: file:/home/grid/hive-0.11.0/file1
Loading data to table default.part partition (data=2013-11-7)
Partition default.part{data=2013-11-7} stats: [num_files: 1, num_rows: 0, total_size: 0, raw_data_size: 0]
Table default.part stats: [num_partitions: 1, num_files: 1, num_rows: 0, total_size: 0, raw_data_size: 0]
OK
Time taken: 0.649 seconds
hive> load data local inpath 'file2'              
    > into table part
    >  partition (data='2013-11-8');
Copying data from file:/home/grid/hive-0.11.0/file2
Copying file: file:/home/grid/hive-0.11.0/file2
Loading data to table default.part partition (data=2013-11-8)
Partition default.part{data=2013-11-8} stats: [num_files: 1, num_rows: 0, total_size: 0, raw_data_size: 0]
Table default.part stats: [num_partitions: 2, num_files: 2, num_rows: 0, total_size: 0, raw_data_size: 0]
OK
Time taken: 0.499 seconds

3、查看文件系统的存储方式

[grid@h1 hive-0.11.0]$ hadoop dfs -ls /user/hive/warehouse/part
Found 2 items
drwxr-xr-x   - grid supergroup          0 2013-08-29 08:19 /user/hive/warehouse/part/data=2013-11-7
drwxr-xr-x   - grid supergroup          0 2013-08-29 08:19 /user/hive/warehouse/part/data=2013-11-8
[grid@h1 hive-0.11.0]$ hadoop dfs -ls /user/hive/warehouse/part/data=2013-11-7
Found 1 items
-rw-r--r--   2 grid supergroup          0 2013-08-29 08:19 /user/hive/warehouse/part/data=2013-11-7/file1
[grid@h1 hive-0.11.0]$ hadoop dfs -ls /user/hive/warehouse/part/data=2013-11-8
Found 1 items
-rw-r--r--   2 grid supergroup          0 2013-08-29 08:19 /user/hive/warehouse/part/data=2013-11-8/file2
[grid@h1 hive-0.11.0]$

4、show partitions 可以查看表的分区
hive> show partitions part;
OK
data=2013-11-7
data=2013-11-8
Time taken: 0.119 seconds, Fetched: 2 row(s)
hive>

注意:partitioned by 子句中定义的列,数据文件中并不包含这些列值。

5、select 使用分区列查询,hive会对输入做修剪
hive> select str from part where data='2013-11-7';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201308281304_0010, Tracking URL =http://h1:50030/jobdetails.jsp?jobid=job_201308281304_0010
Kill Command = /home/grid/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201308281304_0010
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-08-29 08:30:27,427 Stage-1 map = 0%,  reduce = 0%
2013-08-29 08:30:28,445 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.32 sec
2013-08-29 08:30:29,452 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.32 sec
MapReduce Total cumulative CPU time: 320 msec
Ended Job = job_201308281304_0010
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 0.32 sec   HDFS Read: 225 HDFS Write: 11 SUCCESS
Total MapReduce CPU Time Spent: 320 msec
OK
file1 test
Time taken: 5.553 seconds, Fetched: 1 row(s)