(7-9)hive创建4种表

来源：互联网发布：马克斯 cms 安装编辑：程序博客网时间：2024/05/16 08:43
(7-9)hive创建4种表
--------------------------------------------- hive 的数据类型 ------------------------------------------------------------------- 1、管理表-------------------------------------1.1、所有Table数据都保存在warehouse这个目录中；1.2、删除表时，元数据与数据都会被删除；1.3、创建过程和数据加载过程可在同一个语句中完成；也叫内部表、受控表//1.1、准备上传的数据[root@i-love-you ~]# more inner_table.dathadoophdfsmapreducehivehbase[root@i-love-you ~]#//1.2、创建内部表hive> create table inner_table(key string);OKTime taken: 0.29 seconds//1.3、加载数据hive> load data local inpath '/root/inner_table.dat' into table inner_table;Loading data to table mydb.inner_tableTable mydb.inner_table stats: [numFiles=1, totalSize=33]OKTime taken: 4.917 seconds//1.4、查询表全部数据hive> select * from inner_table;OKhadoophdfsmapreducehivehbaseTime taken: 1.086 seconds, Fetched: 5 row(s)//1.5、统计表有多少行记录，会跑一个mapreduce。hive> select count(*) from inner_table;Query ID = root_20150428214949_d311da69-73d7-4278-8c1e-667a461aa287Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1430222047693_0001, Tracking URL = http://i-love-you:8088/proxy/application_1430222047693_0001/Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1430222047693_0001Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12015-04-28 21:51:05,875 Stage-1 map = 0%,  reduce = 0%2015-04-28 21:51:59,258 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.94 sec2015-04-28 21:52:42,454 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.94 secMapReduce Total cumulative CPU time: 6 seconds 940 msecEnded Job = job_1430222047693_0001MapReduce Jobs Launched:Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.94 sec   HDFS Read: 267 HDFS Write: 2 SUCCESSTotal MapReduce CPU Time Spent: 6 seconds 940 msecOK5Time taken: 189.992 seconds, Fetched: 1 row(s)hive>//1.6、删除表hive> drop table inner_table;Moved: 'hdfs://192.168.1.10:9000/user/hive/warehouse/mydb.db/inner_table' to trash at: hdfs://192.168.1.10:9000/user/root/.Trash/CurrentOKTime taken: 21.272 secondshive> show tables;OKt1t2Time taken: 0.136 seconds, Fetched: 2 row(s)hive>----------------------------- 2、外部表-------------------------------------外部表：包含 external 的表叫外部表2.1、删除表只删除 metastore 的元数据，不删除hdfs的表数据；2.2、外部表只是一个过程，加载数据和创建表同时完成，并不会移动数据到数据仓库中，只是与外部数据简历了一个链接，当删除一个表仅删除该链接；2.3、指向已在HDFS中的数据，可以创建partition；2.4、外部表与内部表在元数据组织上相同；//待上传的数据[root@i-love-you ~]# more external_table.dathadoop  23hbase   24[root@i-love-you ~]#//创建外部表hive> create external table external_table(name string,age int)row format delimited fields terminated by "\t";OKTime taken: 0.754 seconds//查看表结构hive> desc extended external_table;OKname                    stringage                     intDetailed Table Information      Table(tableName:external_table, dbName:mydb, owner:root, createTime:1430230920, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:age, type:int, comment:null)], location:hdfs://192.168.1.10:9000/user/hive/warehouse/mydb.db/external_table, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=        , field.delim=Time taken: 0.208 seconds, Fetched: 4 row(s)hive>//加载数据hive> load data local inpath "/root/external_table.dat" into table external_table;Loading data to table mydb.external_tableTable mydb.external_table stats: [numFiles=1, totalSize=19]OKTime taken: 0.837 secondshive>//查询数据hive> select * from external_table;OKhadoop  23hbase   24Time taken: 0.177 seconds, Fetched: 2 row(s)//删除表hive> drop table external_table;OKTime taken: 20.949 secondshive> //表已被删除，再次创建表，不用加载数据，可以直接查到数据hive> show tables;OKt1t2Time taken: 0.068 seconds, Fetched: 2 row(s)hive> create external table external_table(name string,age int)row format delimited fields terminated by "\t";OKTime taken: 0.206 secondshive> select * from external_table;OKhadoop  23hbase   24Time taken: 0.154 seconds, Fetched: 2 row(s)hive>----------------------------- 3、分区表-------------------------------------3.1、分区可以理解为分类，通过分类把不同类型的数据放到不同的目录下；3.2、分类的标准就是分区字段，可以一个，也可以多个；3.3、分区表的意义在于优化查询。查询时尽量利用分区字段，如果不使用分区字段，就会全部扫描；//准备加载的数据[root@baozi ~]# vim aa[root@baozi ~]# more aahadoop  23hive    24hbase   25hdfs    26mapreduce       27spark   28[root@baozi ~]#//创建表：hive> create table t1(name string,age int) partitioned by (class string) row format delimited fields terminated by "\t";OKTime taken: 1.002 seconds//加载数据hive> load data local inpath "/root/aa" into table t1 partition(class="job1");Loading data to table default.t1 partition (class=job1)Partition default.t1{class=job1} stats: [numFiles=1, numRows=0, totalSize=58, rawDataSize=0]OKTime taken: 3.684 secondshive>//查询指定分区的数据hive> select * from t1 where class="job1";OKhadoop  23      job1hive    24      job1hbase   25      job1hdfs    26      job1mapreduce       27      job1spark   28      job1        NULL    job1Time taken: 1.747 seconds, Fetched: 7 row(s)hive>//增加一个分区hive> alter table t1 add partition(class="job2");OKTime taken: 0.247 secondshive> show partitions t1;OKclass=job1class=job2Time taken: 0.323 seconds, Fetched: 2 row(s)hive>//删除一个分区hive> alter table t1 drop partition(class="job2");Dropped the partition class=job2OKTime taken: 1.907 secondshive> show partitions t1;OKclass=job1Time taken: 0.143 seconds, Fetched: 1 row(s)hive>//多个分区创建表：hive> create table t2(name string,age int) partitioned by (class string,city string)row format delimited fields terminated by "\t";OKTime taken: 0.117 seconds//查看表结构hive> desc t2;OKname                    stringage                     intclass                   stringcity                    string# Partition Information# col_name              data_type               commentclass                   stringcity                    stringTime taken: 0.186 seconds, Fetched: 10 row(s)//加载数据hive> load data local inpath "/root/aa" into table t2 partition(class="job1",city="bj");Loading data to table default.t2 partition (class=job1, city=bj)Partition default.t2{class=job1, city=bj} stats: [numFiles=1, numRows=0, totalSize=58, rawDataSize=0]OKTime taken: 0.73 seconds//查询数据hive> select * from t2;OKhadoop  23      job1    bjhive    24      job1    bjhbase   25      job1    bjhdfs    26      job1    bjmapreduce       27      job1    bjspark   28      job1    bj        NULL    job1    bjTime taken: 0.264 seconds, Fetched: 7 row(s)hive>----------------------------- 4、桶表-------------------------------------桶表是对 数据进行 哈希 取值，然后放到不同的文件中存储。4.1、数据加载到桶表时，会对字段取hash值，然后与桶的数量取模，把数据放到对应的文件中。4.2、物理上，每个桶就是表(或分区)目录里的一个文件。4.3、一个作业产生的桶(或输出文件)和reduce任务个数相同。//创建表hive> create table bucket_table (id string,num int) clustered by (id) into 4 buckets;OKTime taken: 0.134 seconds//启用桶表hive> set hive.enforce.bucketing=true;//往桶表里上传数据，会跑一个mapreduce。hive> insert into table bucket_table select name,age from t2;Query ID = root_20150429174242_56181777-f50c-472e-b9d5-d1ce6870d949Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 4In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1430296708269_0001, Tracking URL = http://baozi:8088/proxy/application_1430296708269_0001/Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1430296708269_0001Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 42015-04-29 17:43:39,453 Stage-1 map = 0%,  reduce = 0%2015-04-29 17:44:39,732 Stage-1 map = 0%,  reduce = 0%2015-04-29 17:45:39,891 Stage-1 map = 0%,  reduce = 0%2015-04-29 17:46:07,371 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.16 sec2015-04-29 17:47:08,351 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.75 sec2015-04-29 17:48:30,177 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.75 sec2015-04-29 17:49:30,097 Stage-1 map = 100%,  reduce = 58%, Cumulative CPU 4.95 sec2015-04-29 17:49:44,561 Stage-1 map = 100%,  reduce = 60%, Cumulative CPU 10.85 sec2015-04-29 17:49:49,900 Stage-1 map = 100%,  reduce = 88%, Cumulative CPU 15.02 sec2015-04-29 17:50:16,669 Stage-1 map = 100%,  reduce = 89%, Cumulative CPU 16.2 sec2015-04-29 17:50:19,006 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 16.67 sec2015-04-29 17:51:20,269 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 21.87 secMapReduce Total cumulative CPU time: 21 seconds 870 msecEnded Job = job_1430296708269_0001Loading data to table default.bucket_tableTable default.bucket_table stats: [numFiles=4, numRows=7, totalSize=61, rawDataSize=54]MapReduce Jobs Launched:Stage-Stage-1: Map: 1  Reduce: 4   Cumulative CPU: 21.87 sec   HDFS Read: 282 HDFS Write: 335 SUCCESSTotal MapReduce CPU Time Spent: 21 seconds 870 msecOKTime taken: 575.149 secondshive>[root@baozi ~]# hdfs dfs -ls /user/hive/warehouse/bucket_tableFound 4 items-rw-r--r--   1 root supergroup         12 2015-04-29 17:50 /user/hive/warehouse/bucket_table/000000_0-rw-r--r--   1 root supergroup          0 2015-04-29 17:49 /user/hive/warehouse/bucket_table/000001_0-rw-r--r--   1 root supergroup         13 2015-04-29 17:50 /user/hive/warehouse/bucket_table/000002_0-rw-r--r--   1 root supergroup         36 2015-04-29 17:50 /user/hive/warehouse/bucket_table/000003_0[root@baozi ~]#//在HDFS上面查看桶表的数据：[root@baozi ~]# hdfs dfs -text /user/hive/warehouse/bucket_table/000000_0\Nhive24[root@baozi ~]# hdfs dfs -text /user/hive/warehouse/bucket_table/000002_0mapreduce27[root@baozi ~]# hdfs dfs -text /user/hive/warehouse/bucket_table/000003_0spark28hdfs26hbase25hadoop23[root@baozi ~]# hdfs dfs -text /user/hive/warehouse/bucket_table/000001_0[root@baozi ~]#//查看桶表数据hive> select * from bucket_table;OK        NULLhive    24mapreduce       27spark   28hdfs    26hbase   25hadoop  23Time taken: 1.136 seconds, Fetched: 7 row(s)hive>//查看指定 桶编号的 数据hive> select * from bucket_table tablesample(bucket 3 out of 4 on id);OKmapreduce       27Time taken: 0.241 seconds, Fetched: 1 row(s)hive>//用 limit 查看前三个桶表的 数据hive> select * from bucket_table limit 3;OK        NULLhive    24mapreduce       27Time taken: 0.072 seconds, Fetched: 3 row(s)hive>
0 0