Skip to content

data-catering/data-caterer-example

 
 

Repository files navigation

data-caterer-example

Data Catering

Data Caterer is a metadata driven test data management tool that aids in creating production like data across batch and event data systems. Run data validations to ensure your systems have ingested it as expected. Use the Java, Scala API, UI, or YAML files to help with setup or customisation that are all run via Docker.

This repo contains example Java and Scala API usage for Data Caterer.

Basic data flow of Data Caterer

How

Check out any of the Scala examples or Java examples. If you are looking for more information, you can follow detailed documentation found here.

Not comfortable with Java or Scala? No worries. You can use the UI via these steps.

Want some YAML instead? Also, no worries. Check the example plan and task YAML files here.

Java

  1. Create new Java class similar to DocumentationJavaPlanRun.java
    1. Needs to extend io.github.datacatering.datacaterer.javaapi.api.PlanRun

Scala

  1. Create new Scala class similar to DocumentationPlanRun.scala
    1. Needs to extend io.github.datacatering.datacaterer.api.PlanRun

YAML

  1. Copy existing plan file (such as foreign-key.yaml) in directory docker/data/custom/plan
  2. Copy existing task file (such as json-account-task.yaml) in directory docker/data/custom/task
    1. If you want to run data validations, copy the file simple-validation.yaml and add validation to plan via:
    validations:
      - "<name of validation (i.e. account_checks)>"
  3. Use JSON schema to help creating metadata for plan, tasks or validations. You can import this schema into your IDE for validation of your YAML files. Links below show how you can import the schema:

Run

Requires:

  • Docker
./run.sh
#check results under docker/sample/report/index.html folder

Docker

Create your own Docker image via:

./gradlew clean build
docker build -t <my_image_name>:<my_image_tag> .
docker run -e PLAN_CLASS=io.github.datacatering.plan.DocumentationPlanRun -v ${PWD}/docs/run:/opt/app/data <my_image_name>:<my_image_tag>
#check results under docs/run folder

Docker Compose

Run with own class from either Java or Scala API:

./gradlew clean build
cd docker
PLAN_CLASS=io.github.datacatering.plan.DocumentationPlanRun DATA_SOURCE=postgres docker-compose up -d datacaterer

Details from docs.
Docker compose sample found under docker folder.

cd docker
docker-compose up -d datacaterer

Check result under here.

Change to another data source via:

  • postgres
  • mysql
  • cassandra
  • solace
  • kafka
  • http
DATA_SOURCE=cassandra docker-compose up -d datacaterer

Run with YAML files

Example YAML files can be found here:

  • Plan: Define tasks, data sources, foreign keys, etc. to run
  • Task: Define data generation details such as schema and number of records
  • Validation: Define data validation details to run on data sources

If you want to use a different YAML plan for the data source, you can run:

PLAN=plan/postgres-multiple-tables DATA_SOURCE=postgres docker-compose up -d datacaterer

Helm

helm install data-caterer ./data-caterer-example/helm/data-caterer

Benchmarks

Base benchmark tests can be run via:

bash benchmark/run_benchmark.sh

Results can be found under benchmark/results.