ETL Pipeline

In adaptor jobs there are 2 fundamental modes of operation.

Bounded mode - The pipeline runs once based on the specified schedule
Unbounded mode - The pipeline is always running

Pipeline Specification file

A pipeline maybe specified in Json format and submitted to the framework server to auto-generate JAR files and run them. The following is the spec outline to be followed in making a configuration file.

Spec Outline

{
    "name": "<unique name for this adaptor",
    "schedulePattern": "<cron like schedule pattern >",
    "adaptorType": "ETL",

    "failureRecoverySpec": {
    },
    
    "inputSpec": {
    },
    
    "parseSpec": {
    },
    
    "deduplicationSpec": {
    },
    
    "transformSpec": {
    },
    
    "publishSpec": {
    }
}

Detailed explanation of the individual specs are given below.

Meta spec
Failure Recovery Spec
Input Spec
Parse Spec
Deduplication Spec
Transformation Spec
Publish Spec

The spec can then be submitted to the adaptor server which will validate it and generate a JAR for the entire pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etl_usage.md

etl_usage.md

ETL Pipeline

Pipeline Specification file

Spec Outline

Files

etl_usage.md

Latest commit

History

etl_usage.md

File metadata and controls

ETL Pipeline

Pipeline Specification file

Spec Outline