离线数据迁移DataX3初使用
来源:互联网 发布:如何学计算机编程 编辑:程序博客网 时间:2024/05/19 10:35
DataX3使用起来还是很方便的,下面是一些官方的东西
DataX3的GitHub地址https://github.com/alibaba/DataX,里面包含DataX3的介绍,下载链接。
DataX3的使用方法https://github.com/alibaba/DataX/wiki/Quick-Start
各种reader和writer的配置参数https://github.com/alibaba/DataX/wiki/DataX-all-data-channels
基本使用
检查系统Python是否可用,Linux系统一般都自带Python,官方建议使用Python2
直接下载DataX,http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
下载后解压至本地某个目录,修改权限为755,进入bin目录,即可运行样例同步作业:
$ tar zxvf datax.tar.gz$ sudo chmod -R 755 datax$ cd datax/bin$ python datax.py ../job/job.json
样例运行结果,可以正常使用
然后创建自己的配置文件
DataX目前支持的数据源DataX all data channels
以mysqlreader和mysqlwriter为例,查看配置模板
$ cd datax/bin$ python datax.py -r mysqlreader -w mysqlwriterDataX (DATAX-OPENSOURCE-3.0), From Alibaba !Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.Please refer to the mysqlreader document: https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md Please refer to the mysqlwriter document: https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md Please save the following configuration as a json file and use python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json to run the job.{ "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": [], "connection": [ { "jdbcUrl": [], "table": [] } ], "password": "", "username": "", "where": "" } }, "writer": { "name": "mysqlwriter", "parameter": { "column": [], "connection": [ { "jdbcUrl": "", "table": [] } ], "password": "", "preSql": [], "session": [], "username": "", "writeMode": "" } } } ], "setting": { "speed": { "channel": "" } } }}
用上面的模板创建自己的json配置文件
命名为mysql2mysql.json,放到job目录下
$ vim datax/job/mysql2mysql.json{ "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": ["id","name","location","age"], "connection": [ { "jdbcUrl": ["jdbc:mysql://192.168.1.130:3306/people?useUnicode=true&characterEncoding=UTF-8"], "table": ["test"] } ], "password": "123456", "username": "root", "where": "test.id <= 50000" } }, "writer": { "name": "mysqlwriter", "parameter": { "column": ["id","name","location","age"], "connection": [ { "jdbcUrl": "jdbc:mysql://192.168.1.131:3306/people?useUnicode=true&characterEncoding=UTF-8", "table": ["test"] } ], "password": "123456", "preSql": [], "session": [], "username": "root", "writeMode": "insert" } } } ], "setting": { "speed": { "channel": "100" } } }}
各参数的详细解释可以到数据源DataX all data channels查看,包括其他数据源的也是类似的做法。
然后在datax目录下创建脚本,启动DataX
#!/bin/bashpython ./bin/datax.py ./job/mysql2mysql.json > ./log/mysql2mysql.log &
可在datax/log/mysql2mysql.log处查看运行情况。此处就不粘贴运行结果了。
DataX3使用起来还是很方便的,速度还可以,同网络内,100并发5千万(15G大小)的数据大概80分钟吧