使用Embulk从Oracle抽取数据到Trafodion
来源:互联网 发布:淘宝会追缴极速退款么 编辑:程序博客网 时间:2024/06/06 03:31
在前面的文章,我们介绍过使用开源工具Kettle可以实现关系型数据库如Oracle到Trafodion的数据抽取工作,本文介绍另外一种ETL工具Embulk。Embulk是一款开源的批量加载工具,可以在多种数据库、存储引擎、文本格式及云服务之间进行数据转换,Embulk官网中的下面这张图比较简单明了的描述出了其特点,
Embulk提供很多plugins,包括Input plugins、Output plugins、File parser plugins等,详细可以参考http://www.embulk.org/plugins/。用户也可以开发自己的plugins并上传,如EsgynDB就基于trafodion自己开发一款基于trafodion的upsert using load实现的批量加载插件,如下图,
本文在此利用上述的trafodion output插件及oracle input插件通过实例描述如何使用embulk实现oracle到trafodion的数据加载。
1 下载安装Embulk
wget https://dl.embulk.org/embulk-latest.jarmkdir ~/.embulk/binmv embulk-latest.jar ~/.embulk/bin/embulkchmod +x ~/.embulk/bin/embulkecho 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrcsource ~/.bashrc
2 验证Embulk下载安装成功
[root@n12 ~]# embulk gem list2017-09-05 20:19:51.184 +0800: Embulk v0.8.31*** LOCAL GEMS ***did_you_mean (default: 1.0.1)jar-dependencies (default: 0.3.5)jruby-openssl (0.9.17 java)json (1.8.3 java)minitest (default: 5.4.1)net-telnet (default: 0.1.1)power_assert (default: 0.2.3)psych (2.0.17 java)racc (1.4.14 java)rake (default: 10.4.2)rdoc (default: 4.2.0)test-unit (default: 3.1.1)
3 安装oracle input插件及trafodion output插件
embulk gem install embulk-input-oracleembulk gem install embulk-output-trafodion
4 验证以上插件安装成功
root@n12 ~]# embulk gem list2017-09-05 20:22:57.146 +0800: Embulk v0.8.31*** LOCAL GEMS ***did_you_mean (default: 1.0.1)embulk-input-oracle (0.8.5)embulk-output-trafodion (0.1.1)jar-dependencies (default: 0.3.5)jruby-openssl (0.9.17 java)json (1.8.3 java)minitest (default: 5.4.1)net-telnet (default: 0.1.1)power_assert (default: 0.2.3)psych (2.0.17 java)racc (1.4.14 java)rake (default: 10.4.2)rdoc (default: 4.2.0)test-unit (default: 3.1.1)
5 准备oracle jar包,用于读取oracle数据库
[root@n12 ~]# ll /opt/drivers/total 2676-rw-rw-r-- 1 trafodion trafodion 2739670 Sep 5 20:38 ojdbc6.jar
6 编辑YAML文件oracle_to_trafodion.yml
注:Embulk使用YAML文件来定义数据批量加载的方式,YAML文件格式定义可参考 http://www.embulk.org/docs/built-in.html#embulk-configuration-file-format
exec: max_threads: 8 min_output_tasks: 4in: type: oracle driver_path: /opt/drivers/ojdbc6.jar url: "jdbc:oracle:thin:@10.10.11.16:1521:esgyn" user: system password: system12345 query: | select * from test_tblout: type: trafodion url: "jdbc:t4jdbc://192.168.1.93:23400/:schema=seabase" user: trafodion password: traf123 default_timezone: 'Asia/Shanghai' table: TEST_TBL mode: insert_direct
7 创建Oracle测试表并插入测试数据
SQL> create table test_tbl(a char(10),b char(10));SQL> insert into test_tbl values('a','b');SQL> insert into test_tbl values('c','d');SQL> insert into test_tbl values('e','f');SQL> select * from test_tbl;A B---------- ----------a bc de f
8 创建Trafodion目标表
>>create table test_tbl(a char(10),b char(10));--- SQL operation complete.>>select * from test_tbl;--- 0 row(s) selected.
9 运行Embulk任务
--预览embulk preview ~/oracle_to_trafodion.yml[root@n12 ~]# embulk preview ~/oracle_to_trafodion.yml2017-09-05 20:41:24.093 +0800: Embulk v0.8.312017-09-05 20:41:27.303 +0800 [INFO] (0001:preview): Loaded plugin embulk-input-oracle (0.8.5)2017-09-05 20:41:27.662 +0800 [INFO] (0001:preview): Using JDBC Driver 11.2.0.4.02017-09-05 20:41:27.977 +0800 [INFO] (0001:preview): SQL: select * from test_tbl2017-09-05 20:41:27.999 +0800 [INFO] (0001:preview): > 0.02 seconds+------------+------------+| A:string | B:string |+------------+------------+| a | b || c | d || e | f |+------------+------------+--运行embulk run ~/oracle_to_trafodion.yml[root@n12 ~]# embulk run ~/oracle_to_trafodion.yml2017-09-05 21:01:26.419 +0800: Embulk v0.8.312017-09-05 21:01:30.260 +0800 [INFO] (0001:transaction): Loaded plugin embulk-input-oracle (0.8.5)2017-09-05 21:01:30.298 +0800 [INFO] (0001:transaction): Loaded plugin embulk-output-trafodion (0.1.1)2017-09-05 21:01:30.807 +0800 [INFO] (0001:transaction): Using JDBC Driver 11.2.0.4.02017-09-05 21:01:30.995 +0800 [INFO] (0001:transaction): Using local thread executor with max_threads=8 / output tasks 4 = input tasks 1 * 4-----------------------------------jdbc:t4jdbc://192.168.1.93:23400/:schema=seabase2017-09-05 21:01:31.019 +0800 [INFO] (0001:transaction): Connecting to jdbc:t4jdbc://192.168.1.93:23400/:schema=seabase options {user=trafodion, tcpKeepAlive=true, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}connection------------------------------------:org.trafodion.jdbc.t4.TrafT4Connection@1cc41b772017-09-05 21:01:36.271 +0800 [INFO] (0001:transaction): Using insert_direct mode[WARN] Plugin uses deprecated constructor of org.embulk.spi.time.TimestampFormatter.[WARN] Report plugins in your config at: https://github.com/embulk/embulk/issues/7452017-09-05 21:01:40.345 +0800 [INFO] (0001:transaction): {done: 0 / 1, running: 0}-----------------------------------jdbc:t4jdbc://192.168.1.93:23400/:schema=seabase2017-09-05 21:01:40.389 +0800 [INFO] (0026:task-0000): Connecting to jdbc:t4jdbc://192.168.1.93:23400/:schema=seabase options {user=trafodion, tcpKeepAlive=true, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}connection------------------------------------:org.trafodion.jdbc.t4.TrafT4Connection@4ee9df382017-09-05 21:01:40.970 +0800 [INFO] (0026:task-0000): Prepared SQL: UPSERT USING LOAD INTO "TEST_TBL" ("A", "B") VALUES (?, ?)-----------------------------------jdbc:t4jdbc://192.168.1.93:23400/:schema=seabase2017-09-05 21:01:47.001 +0800 [INFO] (0026:task-0000): Connecting to jdbc:t4jdbc://192.168.1.93:23400/:schema=seabase options {user=trafodion, tcpKeepAlive=true, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}connection------------------------------------:org.trafodion.jdbc.t4.TrafT4Connection@785aae0e2017-09-05 21:01:52.935 +0800 [INFO] (0026:task-0000): Prepared SQL: UPSERT USING LOAD INTO "TEST_TBL" ("A", "B") VALUES (?, ?)-----------------------------------jdbc:t4jdbc://192.168.1.93:23400/:schema=seabase2017-09-05 21:02:01.036 +0800 [INFO] (0026:task-0000): Connecting to jdbc:t4jdbc://192.168.1.93:23400/:schema=seabase options {user=trafodion, tcpKeepAlive=true, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}connection------------------------------------:org.trafodion.jdbc.t4.TrafT4Connection@38f4c2382017-09-05 21:02:03.872 +0800 [INFO] (0026:task-0000): Prepared SQL: UPSERT USING LOAD INTO "TEST_TBL" ("A", "B") VALUES (?, ?)-----------------------------------jdbc:t4jdbc://192.168.1.93:23400/:schema=seabase2017-09-05 21:02:10.364 +0800 [INFO] (0026:task-0000): Connecting to jdbc:t4jdbc://192.168.1.93:23400/:schema=seabase options {user=trafodion, tcpKeepAlive=true, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}connection------------------------------------:org.trafodion.jdbc.t4.TrafT4Connection@6c47ecb42017-09-05 21:02:17.565 +0800 [INFO] (0026:task-0000): Prepared SQL: UPSERT USING LOAD INTO "TEST_TBL" ("A", "B") VALUES (?, ?)2017-09-05 21:02:36.756 +0800 [INFO] (0026:task-0000): SQL: select * from test_tbl2017-09-05 21:02:36.777 +0800 [INFO] (0026:task-0000): > 0.02 seconds2017-09-05 21:02:36.783 +0800 [INFO] (0026:task-0000): Loading 3 rows2017-09-05 21:02:36.821 +0800 [INFO] (0026:task-0000): > 0.04 seconds (loaded 3 rows in total)2017-09-05 21:02:37.653 +0800 [INFO] (0001:transaction): {done: 1 / 1, running: 0}2017-09-05 21:02:37.665 +0800 [INFO] (main): Committed.2017-09-05 21:02:37.665 +0800 [INFO] (main): Next config diff: {"in":{},"out":{}}
10 检查Embulk任务执行成功
SQL>select * from test_tbl;A B---------- ----------a bc de f--- 3 row(s) selected.
至此,使用Embulk从Oracle抽取数据到Trafodion演示完毕!
- 使用Embulk从Oracle抽取数据到Trafodion
- Kettle实现Oracle到Trafodion数据定时抽取
- 使用Kettle抽取MongoDB数据到Oracle
- 使用sqoop从Oracle或mysql抽取数据到HDFS遇到的报错及解决
- 用Kettle从MySQL迁移数据到Trafodion牛刀小试
- Trafodion 使用odb导出表数据到文件
- Goldengate抽取ORACLE 数据 到 Hbase
- 抽取ORACLE表数据到HIVE
- Oracle GoldenGate 支持 从SAP HANA database抽取或者复制数据到SAP HANA database 吗?
- 【JEECG示例文档】使用Kettle从mysql向oracle中抽取数据
- 【JEECG示例文档】使用Kettle从mysql向oracle中抽取数据
- 从mysql导数据到trafodion
- 用Java实现从数据库抽取数据到Xml
- 利用talend将数据从orcale抽取,存到hive
- 从源数据库抽取数据到中间库
- Sqoop从mysql抽取数据到hive中
- oracle 从数据库中随机抽取数据 随机数
- JDBC实现从Hive抽取数据导入Oracle
- 前端笔试题(2017年9月)
- 常量是什么?
- 字符串库函数实现
- MyISAM和InnoDB的索引在实现上的不同
- js for in循环获取对象属性名,属性值
- 使用Embulk从Oracle抽取数据到Trafodion
- poj 3690(二维哈希)
- java中的jre和jdk的区别
- 【03】vue.js — http请求交互
- centos7 安装psutil
- 【leetcode】88. Merge Sorted Array(Python & C++)
- 编译Android平台使用的FFmpeg库
- 直接插入排序(java版)
- Vuejs之路之--父子组件(未完待续)