-
Notifications
You must be signed in to change notification settings - Fork 56
Concepts
This implementation of Data Prep uses the concepts of Record, Column, Directive, Step, and Pipeline.
A Recipe is a collection of Directive. It consists of one or more Directive.
A Directive is a single data manipulation instruction, specified to either transform, filter, or pivot a single record into zero or more records. A directive can generate one or more steps to be executed by a pipeline.
A Row is a collection of field names and field values.
A Column is a data value of any of the supported Java types, one for each record.
A Pipeline is a collection of steps to be applied on a record. The record(s) outputed from a step are passed to the next step in the pipeline.
A directive can be represented in text in this format:
<command> <argument-1> <argument-2> ... <argument-n>
A row in this documentation will be shown as a JSON object with an object key representing the column names and a value shown by the plain representation of the the data, without any mention of types.
For example:
{
"id": 1,
"fname": "root",
"lname": "joltie",
"address": {
"housenumber": "678",
"street": "Mars Street",
"city": "Marcity",
"state": "Maregon",
"country": "Mari"
},
"gender": "M"
}
- Introduction
- Get Started
- Concepts
- System Directives
- Parsers
- Output Formatters
- Transformations
- Encoders and Decoders
- Unique ID Generation
- Date Transformations
- Lookups
- Hashing and Masking
- Row Operations
- Column Operations
- NLP
- Transient In-Memory Counters
- Transformation Functions
- Custom Directives (UDD)
- What is UDD ?
- Building UDD
- UDD Lifecycle
- Field Level Lineage
- Aliasing and Restricting
- Embedding Directives
- Schema Registry
- Pipeline Transform
- Cheatsheet
- Roadmap
- Technical Documents
- Custom Directive Internals