Skip to content
liupengjava edited this page Mar 22, 2016 · 29 revisions

Quick Start

System Requirements

一、工具下载以及部署

  • 方法一、直接下载DataX工具包(如果仅是使用,推荐直接下载):DataX下载地址

    下载后解压至本地某个目录,进入bin目录,即可运行样例同步作业:

    $ tar zxvf datax.tar.gz
    $ cd  {YOUR_DATAX_HOME}/bin
    $ python datax.py ../job/job.json
  • 方法二、下载DataX源码,自己编译:DataX源码编译方法

二、配置示例:从stream读取数据并打印到控制台

  • 第一步、创建作业的配置文件(json格式)

    可以通过命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER}

    $ cd  {YOUR_DATAX_HOME}/bin
    $  python datax.py -r streamreader -w streamwriter
        DataX (DATAX-OPENSOURCE-1.0), From Alibaba !
    Copyright (C) 2010-2015, Alibaba Group. All Rights Reserved.
    
    
    Please refer to the mysqlreader document:
         https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
    
    Please refer to the odpswriter document:
         https://github.com/alibaba/DataX/blob/master/odpswriter/doc/odpswriter.md
    
    Please save the following configuration as a json file and  use
         python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
    to run the job.
    
    {
        "job": {
            "content": [
                {
                    "reader": {
                        "name": "mysqlreader",
                        "parameter": {
                            "column": [],
                            "connection": [
                                {
                                    "jdbcUrl": [],
                                    "table": []
                                }
                            ],
                            "password": "",
                            "username": "",
                            "where": ""
                        }
                    },
                    "writer": {
                        "name": "odpswriter",
                        "parameter": {
                            "accessId": "",
                            "accessKey": "",
                            "column": [],
                            "odpsServer": "",
                            "partition": "",
                            "project": "",
                            "table": "",
                            "truncate": true
                        }
                    }
                }
            ],
            "setting": {
                "speed": {
                    "channel": ""
                }
            }
        }
    }   
  • 第二步、根据配置文件模板填写相关选项

    命令打印里面包含对应reader、writer的文档地址,以及配置json样例,根据json样例填空完成配置即可。根据模板配置json文件(mysql2odps.json)如下:

    {
        "job": {
            "content": [
                {
                    "reader": {
                        "name": "mysqlreader",
                        "parameter": {
                            "username": "****",
                            "password": "****",
                            "column": ["id","age","name"],
                            "connection": [
                                {
                                    "table": [
                                        "test_table"
                                    ],
                                    "jdbcUrl": [
                                        "jdbc:mysql://127.0.0.1:3306/test"
                                    ]
                                }
                            ]
                        }
                    },
                    "writer": {
                        "name": "odpswriter",
                        "parameter": {
                            "accessId": "****",
                            "accessKey": "****",
                            "column": ["id","age","name"],
                            "odpsServer": "http://service.odps.aliyun.com/api",
                            "partition": "pt='datax_test'",
                            "project": "datax_opensource",
                            "table": "datax_opensource_test",
                            "truncate": true
                        }
                    }
                }
            ],
            "setting": {
                "speed": {
                    "channel": 1
                }
            }
        }
    }
    • 第三步:启动DataX

      $ cd {YOUR_DATAX_DIR_BIN}
      $ python datax.py ./mysql2odps.json 

      同步结束,显示日志如下:

      ...
      2015-12-17 11:20:25.263 [job-0] INFO  JobContainer - 
      任务启动时刻                    : 2015-12-17 11:20:15
      任务结束时刻                    : 2015-12-17 11:20:25
      任务总计耗时                    :                 10s
      任务平均流量                    :              205B/s
      记录写入速度                    :              5rec/s
      读出记录总数                    :                  50
      读写失败总数                    :                   0

接下来请根据您所需要的插件完成配置,并完成同步。

所有数据源配置指南,请参考:DataX数据源指南

Clone this wiki locally