Skip to content

Concepts

Nitin Motgi edited this page Jul 30, 2017 · 1 revision

DataPrep Concepts

This implementation of Data Prep uses the concepts of Record, Column, Directive, Step, and Pipeline.

Recipe

A Recipe is a collection of Directive. It consists of one or more Directive.

Directive

A Directive is a single data manipulation instruction, specified to either transform, filter, or pivot a single record into zero or more records. A directive can generate one or more steps to be executed by a pipeline.

Row

A Row is a collection of field names and field values.

Column

A Column is a data value of any of the supported Java types, one for each record.

Pipeline

A Pipeline is a collection of steps to be applied on a record. The record(s) outputed from a step are passed to the next step in the pipeline.

Notations

Directives

A directive can be represented in text in this format:

<command> <argument-1> <argument-2> ... <argument-n>

Row

A row in this documentation will be shown as a JSON object with an object key representing the column names and a value shown by the plain representation of the the data, without any mention of types.

For example:

{
  "id": 1,
  "fname": "root",
  "lname": "joltie",
  "address": {
    "housenumber": "678",
    "street": "Mars Street",
    "city": "Marcity",
    "state": "Maregon",
    "country": "Mari"
  },
  "gender": "M"
}
  • Introduction
  • Get Started
  • Concepts
  • System Directives
    • Parsers
    • Output Formatters
    • Transformations
    • Encoders and Decoders
    • Unique ID Generation
    • Date Transformations
    • Lookups
    • Hashing and Masking
    • Row Operations
    • Column Operations
    • NLP
    • Transient In-Memory Counters
    • Transformation Functions
  • Custom Directives (UDD)
    • What is UDD ?
    • Building UDD
    • UDD Lifecycle
  • Field Level Lineage
  • Aliasing and Restricting
  • Embedding Directives
  • Schema Registry
  • Pipeline Transform
  • Cheatsheet
  • Roadmap
  • Technical Documents
    • Custom Directive Internals
Clone this wiki locally