Trafodion Bulk Load 初试
来源:互联网 发布:国家数据统计数据库 编辑:程序博客网 时间:2024/05/17 06:34
如在文章《Trafodion 数据加载介绍》中提到,Trafodion的Bulk Load与Tricke Load不同,Bulk Load主要是针对大数据量的且一般是批量装载的方式。Bulk Load通常使用LOAD语句实现。
下面介绍几种Bulk Load方式,
- 从Trafodion表加载数据
- 从HDFS文件加载数据(Hive外部表)
- 从Hive表加载数据
- 从外部数据库加载数据(Mysql to Hive)
从Trafodion表加载数据
(1)创建两个表结构相同的测试表
SQL>showddl test_tbl;CREATE TABLE TRAFODION.SEABASE.TEST_TBL ( ID INT NO DEFAULT NOT NULL NOT DROPPABLE SERIALIZED , NAME VARCHAR(10) CHARACTER SET ISO88591 COLLATE DEFAULT DEFAULT NULL SERIALIZED );--- SQL operation complete.SQL>showddl test_tbl2;CREATE TABLE TRAFODION.SEABASE.TEST_TBL2 ( ID INT NO DEFAULT NOT NULL NOT DROPPABLE SERIALIZED , NAME VARCHAR(10) CHARACTER SET ISO88591 COLLATE DEFAULT DEFAULT NULL SERIALIZED );--- SQL operation complete.
(2)表1有几条数据,表2没有数据
SQL>select * from test_tbl;ID NAME----------- ---------- 1 LIU 2 zhang 3 LIU 4 wu 5 feng--- 5 row(s) selected.SQL>select * from test_tbl2;--- 0 row(s) selected.
(3)从表1加载数据到表2
SQL>load into test_tbl2 select * from test_tbl;UTIL_OUTPUT--------------------------------------------------------------------------------------------------------------------------------Task: LOAD Status: Started Object: TRAFODION.SEABASE.TEST_TBL2Task: CLEANUP Status: Started Object: TRAFODION.SEABASE.TEST_TBL2Task: CLEANUP Status: Ended Object: TRAFODION.SEABASE.TEST_TBL2Task: PREPARATION Status: Started Object: TRAFODION.SEABASE.TEST_TBL2 Rows Processed: 5Task: PREPARATION Status: Ended ET: 00:00:00.263Task: COMPLETION Status: Started Object: TRAFODION.SEABASE.TEST_TBL2Task: COMPLETION Status: Ended ET: 00:00:00.531--- SQL operation complete.SQL>select * from test_tbl2;ID NAME----------- ---------- 1 LIU 2 zhang 3 LIU 4 wu 5 feng--- 5 row(s) selected.
从HDFS文件加载数据(Hive 外部表)
(1)创建测试文件并利用scp或者其他方式上传到Hadoop Cluster上,内容如下
[centos@cent-1 tmp]$ cat test_tableLucy,20,FJason,22,MStephen,21,MLily,19,FVincent,25,M
(2)将测试文件导入HDFS中
[hdfs@cent-1 ~]$ hadoop fs -mkdir -p /hive/test/test_tbl[hdfs@cent-1 ~]$ hadoop fs -copyFromLocal /tmp/test_table /hive/test/test_tbl[hdfs@cent-1 ~]$ hadoop fs -ls /hive/test/test_tblFound 1 items-rw-r--r-- 3 hdfs supergroup 57 2016-09-28 02:31 /hive/test/test_tbl/test_table
(3)在Trafodion中创建测试表
SQL>create table test_table ( name varchar(10), age int, sex varchar(1));--- SQL operation complete.SQL>showddl test_table;CREATE TABLE TRAFODION.SEABASE.TEST_TABLE ( NAME VARCHAR(10) CHARACTER SET ISO88591 COLLATE DEFAULT DEFAULT NULL SERIALIZED , AGE INT DEFAULT NULL SERIALIZED , SEX VARCHAR(1) CHARACTER SET ISO88591 COLLATE DEFAULT DEFAULT NULL SERIALIZED );--- SQL operation complete.
(4)在Hive中创建Hive外部表,指向前面对应的HDFS文件
[hdfs@cent-1 ~]$ hiveLogging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/hive-common-1.1.0-cdh5.4.8.jar!/hive-log4j.propertiesWARNING: Hive CLI is deprecated and migration to Beeline is recommended.hive> create external table test_tbl (name string,age int,sex string) row format delimited fields terminated by ',' location '/hive/test/test_tbl';OKTime taken: 0.085 secondshive> show tables;OKtest_tblTime taken: 0.364 seconds, Fetched: 1 row(s)
(5)加载数据到Trafodion表
SQL>load into test_table+>select * from hive.hive.test_tbl;UTIL_OUTPUT--------------------------------------------------------------------------------------------------------------------------------Task: LOAD Status: Started Object: TRAFODION.SEABASE.TEST_TABLETask: CLEANUP Status: Started Object: TRAFODION.SEABASE.TEST_TABLETask: CLEANUP Status: Ended Object: TRAFODION.SEABASE.TEST_TABLETask: PREPARATION Status: Started Object: TRAFODION.SEABASE.TEST_TABLE Rows Processed: 5Task: PREPARATION Status: Ended ET: 00:00:01.272Task: COMPLETION Status: Started Object: TRAFODION.SEABASE.TEST_TABLETask: COMPLETION Status: Ended ET: 00:00:00.254--- SQL operation complete.SQL>select * from test_table;NAME AGE SEX---------- ----------- ----Lucy 20 FJason 22 MStephen 21 MLily 19 FVincent 25 M--- 5 row(s) selected.
从Hive表加载数据
(1)在Hive中创建测试表并插入几条数据
hive> describe hive_tbl;OKname stringage intsex stringTime taken: 0.107 seconds, Fetched: 3 row(s)hive> select * from hive_tbl;OKLucy 20 FJason 22 MStephen 21 MTime taken: 0.093 seconds, Fetched: 3 row(s)
(2)Trafodion中的表继续沿用上述测试表,先清空数据
SQL>purgedata test_table;--- SQL operation complete.SQL>showddl test_table;CREATE TABLE TRAFODION.SEABASE.TEST_TABLE ( NAME VARCHAR(10) CHARACTER SET ISO88591 COLLATE DEFAULT DEFAULT NULL SERIALIZED , AGE INT DEFAULT NULL SERIALIZED , SEX VARCHAR(1) CHARACTER SET ISO88591 COLLATE DEFAULT DEFAULT NULL SERIALIZED );--- SQL operation complete.
(3)从Hive加载数据到Trafodion
(注:在做从hive到trafodion数据加载的时候有几个参数可以配置,用来提升加载性能,如HIVE_MAX_STRING_LENGTH、ALLOW_INCOMPATIBLE_ASSIGNMENT
典型的用法如下:
CQD HIVE_MAX_STRING_LENGTH '1000'; --用于当列的最大长度为1KB时
CQD ALLOW_INCOMPATIBLE_ASSIGNMENT 'ON'; --用于当有时间相关的列的时候)
SQL>load into test_table select * from hive.hive.hive_tbl;UTIL_OUTPUT--------------------------------------------------------------------------------------------------------------------------------Task: LOAD Status: Started Object: TRAFODION.SEABASE.TEST_TABLETask: CLEANUP Status: Started Object: TRAFODION.SEABASE.TEST_TABLETask: CLEANUP Status: Ended Object: TRAFODION.SEABASE.TEST_TABLETask: PREPARATION Status: Started Object: TRAFODION.SEABASE.TEST_TABLE Rows Processed: 3Task: PREPARATION Status: Ended ET: 00:00:00.390Task: COMPLETION Status: Started Object: TRAFODION.SEABASE.TEST_TABLETask: COMPLETION Status: Ended ET: 00:00:00.447--- SQL operation complete.SQL>select * from test_table;NAME AGE SEX---------- ----------- ----Lucy 20 FJason 22 MStephen 21 M--- 3 row(s) selected.
从外部数据库加载数据(Mysql to Hive)
(1)在集群上安装并启动Sqoop,Sqoop安装步骤请参照其官方文档 点击打开链接
(2)下载mysql connector Jar包,并放到/var/lib/sqoop目录下
(3)在某台机器上安装Mysql,安装Mysql步骤此处不作详细说明
(4)在mysql数据中创建测试表,并插入几条数据
mysql> use test;Database changedmysql> create table test_tbl(name varchar(10), age int, sex char(1));Query OK, 0 rows affected (0.00 sec)mysql> select * from test;ERROR 1146 (42S02): Table 'test.test' doesn't existmysql> select * from test_tbl;Empty set (0.00 sec)mysql> insert into test_tbl values('Wang wei', 30, 'M');Query OK, 1 row affected (0.00 sec)mysql> insert into test_tbl values('Li ping', 29, 'F');Query OK, 1 row affected (0.00 sec)mysql>mysql>mysql> select * from test_tbl;+----------+------+------+| name | age | sex |+----------+------+------+| Wang wei | 30 | M || Li ping | 29 | F |+----------+------+------+2 rows in set (0.00 sec)
(5)使用sqoop命令从mysql往hive导入数据
[hdfs@cent-2 ~]$ sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username centos --table test_tbl --split-by name --hive-import --create-hive-table --hive-table test_tabl_hiveWarning: /opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail....Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/hive-common-1.1.0-cdh5.4.8.jar!/hive-log4j.propertiesOKTime taken: 3.052 secondsLoading data to table default.test_tabl_hiveTable default.test_tabl_hive stats: [numFiles=4, totalSize=27]OKTime taken: 0.748 seconds
(6)从Trafodion或Hive Shell中查看数据是否导入成功
SQL>select * from hive.hive.test_tabl_hive;NAME AGE SEX -------------------------------------------------------------------------------------------------------------------------------- ----------- --------------------------------------------------------------------------------------------------------------------------------Li ping 29 F Wang wei 30 M --- 2 row(s) selected.
- Trafodion Bulk Load 初试
- Trafodion Bulk Load 对比 Native HBase Bulk Load
- es的java bulk load
- Trafodion Trickle Load 之 odb Load/Extract/Copy命令用法
- bulk load关于分隔符的问题
- hbase的bulk load一个小改造
- bulk-load装载hdfs数据到hbase
- bulk-load 装载HDFS数据到HBase
- Hadoop 从零开始学习系列-bulk load研究
- 通过Bulk Load导入HBase海量数据
- bulk load关于分隔符的问题
- Kerberos HBase集群Bulk Load权限问题
- HBase Bulk Load的基本使用
- HBase数据导入----improttsv&Bulk Load
- Trafodion ETL工具odb进行数据load方法及性能
- bulk
- hbase的bulk load一个小改造(续)
- [HBase] bulk-load装载hdfs数据到hbase小结
- 解决 Code::Blocks之It seems that this file has not been built yet.
- vuejs使用指南(一)
- 自动远程拷贝expect脚本
- 约瑟夫环问题
- tomcat服务器启动不起来
- Trafodion Bulk Load 初试
- Cocoa pods的安装使用以及过程中可能会遇到问题的汇总
- 详细的K-means聚类
- Java二分法的使用案例
- 201610月阅读书藉
- 去除MyEclipse频繁弹出的Update Progress窗口
- DevExpress破解版下载
- Opencv之存储并标记视频每一帧
- python数据库操作 - PyMySQL,MySQLdb