Data Caterer is a metadata driven test data management tool that aids in creating production like data across batch and event data systems. Run data validations to ensure your systems have ingested it as expected. Use the Java, Scala API, UI, or YAML files to help with setup or customisation that are all run via Docker.
This repo contains example Java and Scala API usage for Data Caterer.
Check out any of the Scala examples or Java examples. If you are looking for more information, you can follow detailed documentation found here.
Not comfortable with Java or Scala? No worries. You can use the UI via these steps.
Want some YAML instead? Also, no worries. Check the example plan and task YAML files here.
- Create new Java class similar
to DocumentationJavaPlanRun.java
- Needs to extend
io.github.datacatering.datacaterer.javaapi.api.PlanRun
- Needs to extend
- Create new Scala class similar
to DocumentationPlanRun.scala
- Needs to extend
io.github.datacatering.datacaterer.api.PlanRun
- Needs to extend
- Copy existing plan file (such as foreign-key.yaml) in directory docker/data/custom/plan
- Copy existing task file (such as json-account-task.yaml)
in directory docker/data/custom/task
- If you want to run data validations, copy the file simple-validation.yaml and add validation to plan via:
validations: - "<name of validation (i.e. account_checks)>"
- Use JSON schema to help creating metadata for plan, tasks or validations. You can import this schema into your IDE for validation of your YAML files. Links below show how you can import the schema:
Requires:
- Docker
./run.sh
#check results under docker/sample/report/index.html folder
Create your own Docker image via:
./gradlew clean build
docker build -t <my_image_name>:<my_image_tag> .
docker run -e PLAN_CLASS=io.github.datacatering.plan.DocumentationPlanRun -v ${PWD}/docs/run:/opt/app/data <my_image_name>:<my_image_tag>
#check results under docs/run folder
Run with own class from either Java or Scala API:
./gradlew clean build
cd docker
PLAN_CLASS=io.github.datacatering.plan.DocumentationPlanRun DATA_SOURCE=postgres docker-compose up -d datacaterer
Details from docs.
Docker compose sample found under docker
folder.
cd docker
docker-compose up -d datacaterer
Check result under here.
Change to another data source via:
- postgres
- mysql
- cassandra
- solace
- kafka
- http
DATA_SOURCE=cassandra docker-compose up -d datacaterer
Example YAML files can be found here:
- Plan: Define tasks, data sources, foreign keys, etc. to run
- Task: Define data generation details such as schema and number of records
- Validation: Define data validation details to run on data sources
If you want to use a different YAML plan for the data source, you can run:
PLAN=plan/postgres-multiple-tables DATA_SOURCE=postgres docker-compose up -d datacaterer
helm install data-caterer ./data-caterer-example/helm/data-caterer
Base benchmark tests can be run via:
bash benchmark/run_benchmark.sh
Results can be found under benchmark/results.