Dataset Governance Policy (DGP)

To ensure the traceability, reproducibility and standardization for all ML datasets and models generated and consumed within Toyota Research Institute (TRI), we developed the Dataset-Governance-Policy (DGP) that codifies the schema and maintenance of all TRI's Autonomous Vehicle (AV) datasets.

Components

Schema: Protobuf-based schemas for raw data, annotations and dataset management.
DataLoaders: Universal PyTorch DatasetClass to load all DGP-compliant datasets.
CLI: Main CLI for handling DGP datasets and the entrypoint of visulization tools.

Getting Started

Please see Getting Started for environment setup.

Getting started is as simple as initializing a dataset-class with the relevant dataset JSON, raw data sensor names, annotation types, and split information. Below, we show a few examples of initializing a Pytorch dataset for multi-modal learning from 2D bounding boxes, and 3D bounding boxes.

from dgp.datasets import SynchronizedSceneDataset

# Load synchronized pairs of camera and lidar frames, with 2d and 3d
# bounding box annotations.
dataset = SynchronizedSceneDataset('<dataset_name>_v0.0.json',
    datum_names=('camera_01', 'lidar'),
    requested_annotations=('bounding_box_2d', 'bounding_box_3d'),
    split='train')

Examples

A list of starter scripts are provided in the examples directory.

examples/load_dataset.py: Simple example script to load a multi-modal dataset based on the Getting Started section above.

Build and run tests

You can build the base docker image and run the tests within docker container via:

make docker-build
make docker-run-tests

Contributing

We appreciate all contributions to DGP! To learn more about making a contribution to DGP, please see Contribution Guidelines.

CI Ecosystem

Job	CI	Notes
docker-build		Docker build and push to container registry
pre-merge		Pre-merge testing
doc-gen		GitHub Pages doc generation
coverage		Code coverage metrics and badge generation

💬 Where to file bug reports

Type	Platforms
🚨 Bug Reports	GitHub Issue Tracker
🎁 Feature Requests	GitHub Issue Tracker

👩‍💻 The Team 👨‍💻

DGP is developed and currently maintained by Quincy Chen, Arjun Bhargava, Chao Fang, Chris Ochoa and Kuan-Hui Lee from ML-Engineering team at Toyota Research Institute (TRI), with contributions coming from ML-Research team at TRI, Woven Planet and Parallel Domain.

Name	Name	Last commit message	Last commit date
Latest commit tk-woven chore: remove tk-woven from default issue assignees (#170 ) Jan 4, 2025 2bcb53d · Jan 4, 2025 History 118 Commits
.github	.github	chore: remove tk-woven from default issue assignees (#170 )	Jan 4, 2025
dgp	dgp	fix: fix conversion in key point and keyline 3d (#167 )	Aug 6, 2024
docs	docs	build: update numpy (#161 )	Jan 31, 2024
examples	examples	v1.0 release	Sep 30, 2021
scripts	scripts	feat: use the same UID:GID as the host in `make docker-start-interact…	Dec 13, 2022
tests	tests	fix: fix conversion in key point and keyline 3d (#167 )	Aug 6, 2024
.commitlintrc.yml	.commitlintrc.yml	feat: move last SuperLinter check `yamllint` to pre-commit (#124 )	Aug 23, 2022
.coveragerc	.coveragerc	test: add support for coverage and pytest (#47 )	Nov 11, 2021
.dockerignore	.dockerignore	v1.0 release	Sep 30, 2021
.gitignore	.gitignore	test: add support for coverage and pytest (#47 )	Nov 11, 2021
.pre-commit-config.yaml	.pre-commit-config.yaml	schema: add instance and semantic segmentation 3D schema (#143 )	Apr 10, 2023
.pylintrc	.pylintrc	fix: explicitly load docparams pylint plugin in .pylintrc (#109 )	Aug 4, 2022
.style.yapf	.style.yapf	TRI DGP Open Source release	May 11, 2020
.yamllint.yaml	.yamllint.yaml	feat: move last SuperLinter check `yamllint` to pre-commit (#124 )	Aug 23, 2022
Dockerfile	Dockerfile	build: update numpy (#161 )	Jan 31, 2024
LICENSE	LICENSE	v1.0 release	Sep 30, 2021
MANIFEST.in	MANIFEST.in	TRI DGP Open Source release	May 11, 2020
Makefile	Makefile	build: update numpy (#161 )	Jan 31, 2024
README.md	README.md	feat: replace markdownlint with prettier (#123 )	Aug 22, 2022
pyproject.toml	pyproject.toml	feat: introduce pre-commit and configure most existing linters in it (#…	Aug 12, 2022
requirements-dev.txt	requirements-dev.txt	feat: introduce pre-commit and configure most existing linters in it (#…	Aug 12, 2022
requirements.txt	requirements.txt	fix: revert pillow simd to resolve requirements conflicts (#166 )	Jul 8, 2024
setup.py	setup.py	build: update numpy (#161 )	Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataset Governance Policy (DGP)

Components

Getting Started

Examples

Build and run tests

Contributing

CI Ecosystem

💬 Where to file bug reports

👩‍💻 The Team 👨‍💻

About

Releases 7

Packages 1

Contributors 25

Languages

License

TRI-ML/dgp

Folders and files

Latest commit

History

Repository files navigation

Dataset Governance Policy (DGP)

Components

Getting Started

Examples

Build and run tests

Contributing

CI Ecosystem

💬 Where to file bug reports

👩‍💻 The Team 👨‍💻

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 1

Contributors 25

Languages