Skip to content

Latest commit

 

History

History
49 lines (37 loc) · 1.23 KB

etl_usage.md

File metadata and controls

49 lines (37 loc) · 1.23 KB

ETL Pipeline

In adaptor jobs there are 2 fundamental modes of operation.

  1. Bounded mode - The pipeline runs once based on the specified schedule
  2. Unbounded mode - The pipeline is always running

Pipeline Specification file

A pipeline maybe specified in Json format and submitted to the framework server to auto-generate JAR files and run them. The following is the spec outline to be followed in making a configuration file.

Spec Outline

{
    "name": "<unique name for this adaptor",
    "schedulePattern": "<cron like schedule pattern >",
    "adaptorType": "ETL",

    "failureRecoverySpec": {
    },
    
    "inputSpec": {
    },
    
    "parseSpec": {
    },
    
    "deduplicationSpec": {
    },
    
    "transformSpec": {
    },
    
    "publishSpec": {
    }
}

Detailed explanation of the individual specs are given below.

The spec can then be submitted to the adaptor server which will validate it and generate a JAR for the entire pipeline.