forked from tbs005/DataX
-
Notifications
You must be signed in to change notification settings - Fork 0
Quick Start
liupengjava edited this page Mar 22, 2016
·
29 revisions
- Linux、Windows
- JDK(1.6以上,推荐1.6)
- Python(推荐Python2.6.X)
- Apache Maven 3.x (Compile DataX)
-
方法一、直接下载DataX工具包(如果仅是使用,推荐直接下载):DataX下载地址
下载后解压至本地某个目录,进入bin目录,即可运行样例同步作业:
$ tar zxvf datax.tar.gz $ cd {YOUR_DATAX_HOME}/bin $ python datax.py ../job/job.json
-
方法二、下载DataX源码,自己编译:DataX源码编译方法
-
第一步、创建作业的配置文件(json格式)
可以通过命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER}
$ cd {YOUR_DATAX_HOME}/bin $ python datax.py -r streamreader -w streamwriter DataX (DATAX-OPENSOURCE-1.0), From Alibaba ! Copyright (C) 2010-2015, Alibaba Group. All Rights Reserved. Please refer to the mysqlreader document: https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md Please refer to the odpswriter document: https://github.com/alibaba/DataX/blob/master/odpswriter/doc/odpswriter.md Please save the following configuration as a json file and use python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json to run the job. { "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": [], "connection": [ { "jdbcUrl": [], "table": [] } ], "password": "", "username": "", "where": "" } }, "writer": { "name": "odpswriter", "parameter": { "accessId": "", "accessKey": "", "column": [], "odpsServer": "", "partition": "", "project": "", "table": "", "truncate": true } } } ], "setting": { "speed": { "channel": "" } } } }
-
第二步、根据配置文件模板填写相关选项
命令打印里面包含对应reader、writer的文档地址,以及配置json样例,根据json样例填空完成配置即可。根据模板配置json文件(mysql2odps.json)如下:
{ "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "****", "password": "****", "column": ["id","age","name"], "connection": [ { "table": [ "test_table" ], "jdbcUrl": [ "jdbc:mysql://127.0.0.1:3306/test" ] } ] } }, "writer": { "name": "odpswriter", "parameter": { "accessId": "****", "accessKey": "****", "column": ["id","age","name"], "odpsServer": "http://service.odps.aliyun.com/api", "partition": "pt='datax_test'", "project": "datax_opensource", "table": "datax_opensource_test", "truncate": true } } } ], "setting": { "speed": { "channel": 1 } } } }
-
第三步:启动DataX
$ cd {YOUR_DATAX_DIR_BIN} $ python datax.py ./mysql2odps.json
同步结束,显示日志如下:
... 2015-12-17 11:20:25.263 [job-0] INFO JobContainer - 任务启动时刻 : 2015-12-17 11:20:15 任务结束时刻 : 2015-12-17 11:20:25 任务总计耗时 : 10s 任务平均流量 : 205B/s 记录写入速度 : 5rec/s 读出记录总数 : 50 读写失败总数 : 0
-
所有数据源配置指南,请参考:DataX数据源指南