hive查看表创建过程

来源:互联网 发布:美国为何针对中国 知乎 编辑:程序博客网 时间:2024/05/15 20:24

参考链接

hive中如何确定map数

假如你hive中已经有ebay_order表,想查看其创建过程可以用

hive> show create table ebay_order;

<span style="font-family:Microsoft YaHei;font-size:18px;">hive> show create table ebay_order;OKCREATE  TABLE `ebay_order`(  `ebay_id` int,   `ebay_ordersn` string,   `ebay_orderqk` string,   `ebay_paystatus` string,   `recordnumber` string,   `ebay_tid` string,   `ebay_ptid` string,   `ebay_orderid` string,   `ebay_createdtime` int,   `ebay_paidtime` string,   `ebay_userid` string,   `ebay_username` string,   `ebay_usermail` string,   `ebay_street` string,   `ebay_street1` string,   `ebay_city` string,   `ebay_state` string,   `ebay_couny` string,   `ebay_countryname` string,   `ebay_postcode` string,   `ebay_phone` string,   `ebay_currency` string,   `ebay_total` double,   `ebay_status` int,   `ebay_user` string,   `ebay_addtime` int,   `ebay_shipfee` string,   `ebay_combine` string,   `market` string,   `ebay_account` string,   `ebay_note` string,   `ebay_noteb` string,   `is_reg` int,   `ordertype` string,   `status` string,   `mailstatus` string,   `templateid` string,   `postive` string,   `ebay_carrier` string,   `ebay_carrierstyle` string,   `ebay_warehouse` string,   `ebay_markettime` string,   `ebay_tracknumber` string,   `ebay_site` string,   `location` string,   `ebaypaymentstatus` string,   `paypalemailaddress` string,   `shippedtime` string,   `refundamount` int,   `resendreason` string,   `refundreason` string,   `resendtime` int,   `refundtime` int,   `canceltime` int,   `cancelreason` string,   `ebay_feedback` string,   `ebay_sdsn` string,   `isprint` int,   `ebay_ordertype` string,   `profitstatus` int,   `orderweight` double,   `orderweight2` double,   `ordershipfee` double,   `ordercopst` double,   `scantime` int,   `ishide` int,   `packingtype` string,   `packinguser` string,   `packagingstaff` string,   `order_no` string,   `ebay_phone1` string,   `main_order` string,   `is_main_order` boolean,   `combine_package` int,   `is_sendreplacement` boolean,   `send_email` tinyint)COMMENT 'Imported by sqoop on 2014/11/05 11:25:31'ROW FORMAT DELIMITED   FIELDS TERMINATED BY '\u0001'   LINES TERMINATED BY '\n' STORED AS INPUTFORMAT   'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'LOCATION  'hdfs://cdhnamenode.com:8020/user/hive/warehouse/ebay_order'TBLPROPERTIES (  'COLUMN_STATS_ACCURATE'='true',   'numFiles'='32',   'numRows'='0',   'rawDataSize'='0',   'totalSize'='284489823',   'transient_lastDdlTime'='1415157943')</span>

从上面可以看到ebay_order表的INPUTFORMAT为org.apache.hadoop.mapred.TextInputFormat

TextInputFormat继承自FileInputFormat。FileInputFormat是一个抽象类,它最重要的功能是为各种InputFormat提供统一的getSplits()方法,该方法最核心的是文件切分算法和Host选择算法。

<span style="font-family:Microsoft YaHei;font-size:18px;">hive> set mapred.min.split.size; mapred.min.split.size=1hive> set mapred.map.tasks;mapred.map.tasks=2hive> set dfs.blocksize;dfs.blocksize=134217728</span>
如果hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat,则这时候的参数如下:

上面参数中mapred.map.tasks为2,dfs.blocksize(使用的是CDH 5.2.0,版本的hadoop,这里block和size之间没有逗号)为128M。

假设有一个文件为200M,则按上面HiveInputFormat的split算法:

1、文件总大小为200M,goalSize=200M /2 =100M,minSize=1 ,splitSize = max{1,min{100M,128M}} =100M

2、200M / 100M >1.1,故第一块大小为100M

3、剩下文件大小为100M,小于128M,故第二块大小为100M。

如果hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat,则这时候的参数如下:


<span style="font-family:Microsoft YaHei;font-size:18px;">hive> set mapred.min.split.size;mapred.min.split.size=1hive> set mapred.max.split.size;mapred.max.split.size=256000000hive> set mapred.min.split.size.per.rack;mapred.min.split.size.per.rack=1hive> set mapred.min.split.size.per.node;mapred.min.split.size.per.node=1hive> set dfs.blocksize;dfs.blocksize=134217728</span>


用java调用hive

beeline
!connect jdbc:hive2://192.168.200.190:10000/default
select count(*) from ebay_account;

0 0
原创粉丝点击