Skip to content
liupengjava edited this page Mar 22, 2016 · 29 revisions

Quick Start

System Requirements

一、工具下载以及部署

  • 方法一、直接下载DataX工具包(如果仅是使用,推荐直接下载):DataX下载地址

    下载后解压至本地某个目录,进入bin目录,即可运行样例同步作业:

    $ tar zxvf datax.tar.gz
    $ cd  {YOUR_DATAX_HOME}/bin
    $ python datax.py ../job/job.json
  • 方法二、下载DataX源码,自己编译:DataX源码编译方法

二、配置示例:从stream读取数据并打印到控制台

  • 第一步、创建作业的配置文件(json格式)

    可以通过命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER}

    $ cd  {YOUR_DATAX_HOME}/bin
    $  python datax.py -r streamreader -w streamwriter
        DataX (DATAX-OPENSOURCE-1.0), From Alibaba !
    Copyright (C) 2010-2015, Alibaba Group. All Rights Reserved.
    
    
    Please refer to the mysqlreader document:
         https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
    
    Please refer to the odpswriter document:
         https://github.com/alibaba/DataX/blob/master/odpswriter/doc/odpswriter.md
    
    Please save the following configuration as a json file and  use
         python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
    to run the job.
    
    {
        "job": {
            "content": [
                {
                    "reader": {
                        "name": "mysqlreader",
                        "parameter": {
                            "column": [],
                            "connection": [
                                {
                                    "jdbcUrl": [],
                                    "table": []
                                }
                            ],
                            "password": "",
                            "username": "",
                            "where": ""
                        }
                    },
                    "writer": {
                        "name": "odpswriter",
                        "parameter": {
                            "accessId": "",
                            "accessKey": "",
                            "column": [],
                            "odpsServer": "",
                            "partition": "",
                            "project": "",
                            "table": "",
                            "truncate": true,
                            "tunnelServer": ""
                        }
                    }
                }
            ],
            "setting": {
                "speed": {
                    "channel": ""
                }
            }
        }
    }   

    命令打印里面包含对应reader、writer的文档地址,以及配置json样例,根据json样例填空完成配置即可。根据模板配置json如下:

    {
      "job": {
        "content": [
          {
            "reader": {
              "name": "streamreader",
              "parameter": {
                "sliceRecordCount": 10,
                "column": [
                  {
                    "type": "long",
                    "value": "10"
                  },
                  {
                    "type": "string",
                    "value": "hello,你好,世界-DataX"
                  }
                ]
              }
            },
            "writer": {
              "name": "streamwriter",
              "parameter": {
                "encoding": "UTF-8",
                "print": true
              }
            }
          }
        ],
        "setting": {
          "speed": {
            "channel": 5
           }
        }
      }
    }
  • 第二步:启动DataX

    $ cd {YOUR_DATAX_DIR_BIN}
    $ python datax.py ./stream2stream.json 

    同步结束,显示日志如下:

    ...
    2015-12-17 11:20:25.263 [job-0] INFO  JobContainer - 
    任务启动时刻                    : 2015-12-17 11:20:15
    任务结束时刻                    : 2015-12-17 11:20:25
    任务总计耗时                    :                 10s
    任务平均流量                    :              205B/s
    记录写入速度                    :              5rec/s
    读出记录总数                    :                  50
    读写失败总数                    :                   0

接下来请根据您所需要的插件完成配置,并完成同步。

所有数据源配置指南,请参考:DataX数据源指南

Clone this wiki locally