Trafodion Bulk Load 初试

来源:互联网 发布:国家数据统计数据库 编辑:程序博客网 时间:2024/05/17 06:34

如在文章《Trafodion 数据加载介绍》中提到,Trafodion的Bulk Load与Tricke Load不同,Bulk Load主要是针对大数据量的且一般是批量装载的方式。Bulk Load通常使用LOAD语句实现。

下面介绍几种Bulk Load方式,

  • 从Trafodion表加载数据
  • 从HDFS文件加载数据(Hive外部表)
  • 从Hive表加载数据
  • 从外部数据库加载数据(Mysql to Hive)


从Trafodion表加载数据

(1)创建两个表结构相同的测试表

SQL>showddl test_tbl;CREATE TABLE TRAFODION.SEABASE.TEST_TBL  (    ID                               INT NO DEFAULT NOT NULL NOT DROPPABLE      SERIALIZED  , NAME                             VARCHAR(10) CHARACTER SET ISO88591 COLLATE      DEFAULT DEFAULT NULL SERIALIZED  );--- SQL operation complete.SQL>showddl test_tbl2;CREATE TABLE TRAFODION.SEABASE.TEST_TBL2  (    ID                               INT NO DEFAULT NOT NULL NOT DROPPABLE      SERIALIZED  , NAME                             VARCHAR(10) CHARACTER SET ISO88591 COLLATE      DEFAULT DEFAULT NULL SERIALIZED  );--- SQL operation complete.

(2)表1有几条数据,表2没有数据

SQL>select * from test_tbl;ID          NAME----------- ----------          1 LIU          2 zhang          3 LIU          4 wu          5 feng--- 5 row(s) selected.SQL>select * from test_tbl2;--- 0 row(s) selected.

(3)从表1加载数据到表2

SQL>load into test_tbl2 select * from test_tbl;UTIL_OUTPUT--------------------------------------------------------------------------------------------------------------------------------Task: LOAD             Status: Started    Object: TRAFODION.SEABASE.TEST_TBL2Task:  CLEANUP         Status: Started    Object: TRAFODION.SEABASE.TEST_TBL2Task:  CLEANUP         Status: Ended      Object: TRAFODION.SEABASE.TEST_TBL2Task:  PREPARATION     Status: Started    Object: TRAFODION.SEABASE.TEST_TBL2       Rows Processed: 5Task:  PREPARATION     Status: Ended      ET: 00:00:00.263Task:  COMPLETION      Status: Started    Object: TRAFODION.SEABASE.TEST_TBL2Task:  COMPLETION      Status: Ended      ET: 00:00:00.531--- SQL operation complete.SQL>select * from test_tbl2;ID          NAME----------- ----------          1 LIU          2 zhang          3 LIU          4 wu          5 feng--- 5 row(s) selected.


从HDFS文件加载数据(Hive 外部表)

(1)创建测试文件并利用scp或者其他方式上传到Hadoop Cluster上,内容如下

[centos@cent-1 tmp]$ cat test_tableLucy,20,FJason,22,MStephen,21,MLily,19,FVincent,25,M

(2)将测试文件导入HDFS中

[hdfs@cent-1 ~]$ hadoop fs -mkdir -p /hive/test/test_tbl[hdfs@cent-1 ~]$ hadoop fs -copyFromLocal /tmp/test_table /hive/test/test_tbl[hdfs@cent-1 ~]$ hadoop fs -ls /hive/test/test_tblFound 1 items-rw-r--r--   3 hdfs supergroup         57 2016-09-28 02:31 /hive/test/test_tbl/test_table

(3)在Trafodion中创建测试表

SQL>create table test_table ( name varchar(10), age int, sex varchar(1));--- SQL operation complete.SQL>showddl test_table;CREATE TABLE TRAFODION.SEABASE.TEST_TABLE  (    NAME                             VARCHAR(10) CHARACTER SET ISO88591 COLLATE      DEFAULT DEFAULT NULL SERIALIZED  , AGE                              INT DEFAULT NULL SERIALIZED  , SEX                              VARCHAR(1) CHARACTER SET ISO88591 COLLATE      DEFAULT DEFAULT NULL SERIALIZED  );--- SQL operation complete.


(4)在Hive中创建Hive外部表,指向前面对应的HDFS文件

[hdfs@cent-1 ~]$ hiveLogging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/hive-common-1.1.0-cdh5.4.8.jar!/hive-log4j.propertiesWARNING: Hive CLI is deprecated and migration to Beeline is recommended.hive> create external table test_tbl (name string,age  int,sex  string) row format delimited fields terminated by ',' location '/hive/test/test_tbl';OKTime taken: 0.085 secondshive> show tables;OKtest_tblTime taken: 0.364 seconds, Fetched: 1 row(s)

(5)加载数据到Trafodion表

SQL>load into test_table+>select * from hive.hive.test_tbl;UTIL_OUTPUT--------------------------------------------------------------------------------------------------------------------------------Task: LOAD             Status: Started    Object: TRAFODION.SEABASE.TEST_TABLETask:  CLEANUP         Status: Started    Object: TRAFODION.SEABASE.TEST_TABLETask:  CLEANUP         Status: Ended      Object: TRAFODION.SEABASE.TEST_TABLETask:  PREPARATION     Status: Started    Object: TRAFODION.SEABASE.TEST_TABLE       Rows Processed: 5Task:  PREPARATION     Status: Ended      ET: 00:00:01.272Task:  COMPLETION      Status: Started    Object: TRAFODION.SEABASE.TEST_TABLETask:  COMPLETION      Status: Ended      ET: 00:00:00.254--- SQL operation complete.SQL>select * from test_table;NAME       AGE         SEX---------- ----------- ----Lucy                20 FJason               22 MStephen             21 MLily                19 FVincent             25 M--- 5 row(s) selected.


从Hive表加载数据

(1)在Hive中创建测试表并插入几条数据

hive> describe hive_tbl;OKname                    stringage                     intsex                     stringTime taken: 0.107 seconds, Fetched: 3 row(s)hive> select * from hive_tbl;OKLucy    20      FJason   22      MStephen 21      MTime taken: 0.093 seconds, Fetched: 3 row(s)

(2)Trafodion中的表继续沿用上述测试表,先清空数据

SQL>purgedata test_table;--- SQL operation complete.SQL>showddl test_table;CREATE TABLE TRAFODION.SEABASE.TEST_TABLE  (    NAME                             VARCHAR(10) CHARACTER SET ISO88591 COLLATE      DEFAULT DEFAULT NULL SERIALIZED  , AGE                              INT DEFAULT NULL SERIALIZED  , SEX                              VARCHAR(1) CHARACTER SET ISO88591 COLLATE      DEFAULT DEFAULT NULL SERIALIZED  );--- SQL operation complete.

(3)从Hive加载数据到Trafodion

(注:在做从hive到trafodion数据加载的时候有几个参数可以配置,用来提升加载性能,如HIVE_MAX_STRING_LENGTH、ALLOW_INCOMPATIBLE_ASSIGNMENT

   典型的用法如下:

    CQD HIVE_MAX_STRING_LENGTH '1000'; --用于当列的最大长度为1KB时

    CQD ALLOW_INCOMPATIBLE_ASSIGNMENT 'ON'; --用于当有时间相关的列的时候

SQL>load into test_table select * from hive.hive.hive_tbl;UTIL_OUTPUT--------------------------------------------------------------------------------------------------------------------------------Task: LOAD             Status: Started    Object: TRAFODION.SEABASE.TEST_TABLETask:  CLEANUP         Status: Started    Object: TRAFODION.SEABASE.TEST_TABLETask:  CLEANUP         Status: Ended      Object: TRAFODION.SEABASE.TEST_TABLETask:  PREPARATION     Status: Started    Object: TRAFODION.SEABASE.TEST_TABLE       Rows Processed: 3Task:  PREPARATION     Status: Ended      ET: 00:00:00.390Task:  COMPLETION      Status: Started    Object: TRAFODION.SEABASE.TEST_TABLETask:  COMPLETION      Status: Ended      ET: 00:00:00.447--- SQL operation complete.SQL>select * from test_table;NAME       AGE         SEX---------- ----------- ----Lucy                20 FJason               22 MStephen             21 M--- 3 row(s) selected.


从外部数据库加载数据(Mysql to Hive)

(1)在集群上安装并启动Sqoop,Sqoop安装步骤请参照其官方文档 点击打开链接


(2)下载mysql connector Jar包,并放到/var/lib/sqoop目录下


(3)在某台机器上安装Mysql,安装Mysql步骤此处不作详细说明


(4)在mysql数据中创建测试表,并插入几条数据

mysql> use test;Database changedmysql> create table test_tbl(name varchar(10), age int, sex char(1));Query OK, 0 rows affected (0.00 sec)mysql> select * from test;ERROR 1146 (42S02): Table 'test.test' doesn't existmysql> select * from test_tbl;Empty set (0.00 sec)mysql> insert into test_tbl values('Wang wei', 30, 'M');Query OK, 1 row affected (0.00 sec)mysql> insert into test_tbl values('Li ping', 29, 'F');Query OK, 1 row affected (0.00 sec)mysql>mysql>mysql> select * from test_tbl;+----------+------+------+| name     | age  | sex  |+----------+------+------+| Wang wei |   30 | M    || Li ping  |   29 | F    |+----------+------+------+2 rows in set (0.00 sec)

(5)使用sqoop命令从mysql往hive导入数据

[hdfs@cent-2 ~]$  sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username centos --table test_tbl --split-by name --hive-import --create-hive-table --hive-table test_tabl_hiveWarning: /opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail....Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/hive-common-1.1.0-cdh5.4.8.jar!/hive-log4j.propertiesOKTime taken: 3.052 secondsLoading data to table default.test_tabl_hiveTable default.test_tabl_hive stats: [numFiles=4, totalSize=27]OKTime taken: 0.748 seconds

(6)从Trafodion或Hive Shell中查看数据是否导入成功

SQL>select * from hive.hive.test_tabl_hive;NAME                                                                                                                             AGE         SEX                                                          -------------------------------------------------------------------------------------------------------------------------------- ----------- --------------------------------------------------------------------------------------------------------------------------------Li ping                                                                                                                                   29 F                                                            Wang wei                                                                                                                                  30 M                                                            --- 2 row(s) selected.






1 0
原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 毕业证丢了怎么办居住证 贴吧回帖被秒删怎么办 群拉人失败了怎么办 安装解释包错误怎么办 皮皮麻将进不去怎么办 郑州集体户口怎么办准生证 广州集体户口准生证怎么办 学校寄档案丢失怎么办 深圳通儿童卡怎么办 集体户没有户口卡怎么办 北京滴滴跑长途怎么办 应届毕业生落户成都档案怎么办 执业医师缺考怎么办 国家级考试缺考怎么办 异地就业后档案怎么办 没有工作报到证怎么办 网络系统管理员封锁网络怎么办 公房没买断产权怎么办 去异地工作社保怎么办 深圳辞职了 社保怎么办 小产权孩子上学怎么办 多余的粽叶怎么办 农村两处宅基地怎么办 农村一户多宅怎么办 无锡未满五年安置房怎么办 人才公寓退休后怎么办 套餐到期不用了怎么办 日本手机自动续约怎么办 如果购房后退房怎么办 土地权70年到期怎么办 在亚庇丢了护照怎么办 回国丢了护照怎么办 大学挂科拿不到毕业证怎么办 孩子脸上长黑痣怎么办 婴儿tsh10.6偏高怎么办 苹果x显示edge怎么办 煮的鸡肉腥怎么办 卫生间蹲坑太高怎么办 鸡肉为什么太腥怎么办 房间有屎臭味怎么办 word转pdf乱码怎么办