Sqoop 加载数据到ORC表

来源:互联网 发布:许嵩如果当时知乎 编辑:程序博客网 时间:2024/06/07 08:45

sqoop目前支持直接从RDBMS加载数据到Hive的ORC表,这可以通过两种方式实现。

  • 先创建Hive ORC表,然后用sqoop import命令将数据导入到Hive ORC表
  • 利用sqoop import同时创建Hive ORC表并加载数据

这里简单测试一下用sqoop从Oracle表导入到Hive ORC表,具体方法如下
(1)

//hive CLI创建orc测试表hive> create table hive_orc1(a int, b string, c string, d int, e string, f string) stored as orc;OKTime taken: 0.254 secondshive> show create table hive_orc1;OKCREATE TABLE `hive_orc1`(  `a` int,  `b` string,  `c` string,  `d` int,  `e` string,  `f` string)ROW FORMAT SERDE  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'STORED AS INPUTFORMAT  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'OUTPUTFORMAT  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'LOCATION  'hdfs://nameservice1/user/hive/warehouse/hive_orc1'TBLPROPERTIES (  'transient_lastDdlTime'='1490084866')//sqoop import命令加载数据sqoop-import --connect jdbc:oracle:thin:@10.10.11.16:1521/esgyn --username system --password 12345 --table TEST_GROUPBY --split-by A --hcatalog-table hive_orc1 --hive-overwrite --null-string '\\N' --null-non-string '\\N' --verbose

(2)

//sqoop import直接创建hive orc表并加载数据sqoop-import --connect jdbc:oracle:thin:@10.10.11.16:1521/esgyn --username system --password 12345 --table TEST_GROUPBY --split-by A --create-hcatalog-table --hcatalog-table hive_orc2 --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'//检查创建的hive orc表hive> show create table hive_orc2;OKCREATE TABLE `hive_orc2`(  `a` decimal(38,0),  `b` char(10),  `c` char(10),  `d` decimal(38,0),  `e` char(10))ROW FORMAT SERDE  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'STORED AS INPUTFORMAT  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'OUTPUTFORMAT  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'LOCATION  'hdfs://nameservice1/user/hive/warehouse/hive_orc2'TBLPROPERTIES (  'orc.compress'='SNAPPY',  'transient_lastDdlTime'='1490086049')Time taken: 0.066 seconds, Fetched: 17 row(s)
1 0
原创粉丝点击