Skip to content

Commit

Permalink
feat: replace markdownlint with prettier (TRI-ML#123)
Browse files Browse the repository at this point in the history
This also disables natural language validation since there is no
corresponding pre-commit hook. We deem this check non-critical and in
its place continue to enforce clear meaning in natural language in code
reivews.
  • Loading branch information
tk-woven authored Aug 22, 2022
1 parent 707f2a2 commit a2c6c12
Show file tree
Hide file tree
Showing 10 changed files with 266 additions and 152 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/pre-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ jobs:
VALIDATE_HTML: false # Handled by pre-commit.
VALIDATE_JSCPD: false
VALIDATE_JSON: false # Handled by pre-commit.
VALIDATE_MARKDOWN: false # Handled by pre-commit.
VALIDATE_NATURAL_LANGUAGE: false # No corresponding pre-commit hook but not critical.
VALIDATE_PROTOBUF: false
VALIDATE_PYTHON_BLACK: false
VALIDATE_PYTHON_FLAKE8: false # Formatted by YAPF.
Expand Down
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@ repos:
hooks:
- id: isort
exclude: _pb2\.py$
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v2.7.1
hooks:
- id: prettier
types: [markdown]
args: [--embedded-language-formatting=off, --prose-wrap=always]
- repo: https://github.com/PyCQA/pylint
rev: v2.14.5
hooks:
Expand Down
82 changes: 49 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,43 @@
<!-- markdownlint-disable-next-line -->
[<img src="docs/tri-logo.png" width="40%">](https://www.tri.global/)

[<img src="docs/tri-logo.png" width="40%">](https://www.tri.global/)

# Dataset Governance Policy (DGP)

[![build-docker](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml)
[![license](https://img.shields.io/github/license/TRI-ML/dgp.svg)](https://github.com/TRI-ML/dgp/blob/master/LICENSE)
[![open-issues](https://img.shields.io/github/issues/TRI-ML/dgp.svg)](https://github.com/TRI-ML/dgp/issues)
![coverage badge](./docs/coverage.svg)
[![docs](https://img.shields.io/badge/documentation-beta-red)](https://tri-ml.github.io/dgp/)

To ensure the traceability, reproducibility and standardization for
all ML datasets and models generated and consumed within Toyota Research Institute (TRI), we developed the
Dataset-Governance-Policy (DGP) that codifies the schema and
maintenance of all TRI's Autonomous Vehicle (AV) datasets.
To ensure the traceability, reproducibility and standardization for all ML
datasets and models generated and consumed within Toyota Research Institute
(TRI), we developed the Dataset-Governance-Policy (DGP) that codifies the schema
and maintenance of all TRI's Autonomous Vehicle (AV) datasets.

<p align="center">
<img src="docs/3d-viz-proj.gif" alt="3d-viz-proj"/>
</p>

## Components
- [Schema](dgp/proto/README.md): [Protobuf](https://developers.google.com/protocol-buffers)-based schemas for raw data, annotations
and dataset management.
- [DataLoaders](dgp/datasets): Universal PyTorch DatasetClass to load all DGP-compliant datasets.
- [CLI](dgp/README.md): Main CLI for handling DGP datasets and the entrypoint of visulization tools.

- [Schema](dgp/proto/README.md):
[Protobuf](https://developers.google.com/protocol-buffers)-based schemas for
raw data, annotations and dataset management.
- [DataLoaders](dgp/datasets): Universal PyTorch DatasetClass to load all
DGP-compliant datasets.
- [CLI](dgp/README.md): Main CLI for handling DGP datasets and the entrypoint of
visulization tools.

## Getting Started

Please see [Getting Started](docs/GETTING_STARTED.md) for environment setup.

Getting started is as simple as initializing a dataset-class with the
relevant dataset JSON, raw data sensor names, annotation types, and
split information. Below, we show a few examples of initializing a
Pytorch dataset for multi-modal learning from 2D bounding boxes, and
3D bounding boxes.
Getting started is as simple as initializing a dataset-class with the relevant
dataset JSON, raw data sensor names, annotation types, and split information.
Below, we show a few examples of initializing a Pytorch dataset for multi-modal
learning from 2D bounding boxes, and 3D bounding boxes.

```python
from dgp.datasets import SynchronizedSceneDataset

Expand All @@ -45,38 +50,49 @@ dataset = SynchronizedSceneDataset('<dataset_name>_v0.0.json',
```

## Examples
A list of starter scripts are provided in the [examples](examples/)
directory.
- [examples/load_dataset.py](examples/load_dataset.py): Simple example
script to load a multi-modal dataset based on the **Getting
Started** section above.

A list of starter scripts are provided in the [examples](examples/) directory.

- [examples/load_dataset.py](examples/load_dataset.py): Simple example script to
load a multi-modal dataset based on the **Getting Started** section above.

## Build and run tests
You can build the base docker image and run the tests within [docker container](docs/GETTING_STARTED.md#markdown-header-develop-within-docker)

You can build the base docker image and run the tests within
[docker container](docs/GETTING_STARTED.md#markdown-header-develop-within-docker)
via:

```sh
make docker-build
make docker-run-tests
```

## Contributing
We appreciate all contributions to DGP! To learn more about making a contribution to DGP, please see [Contribution Guidelines](docs/CONTRIBUTING.md).

We appreciate all contributions to DGP! To learn more about making a
contribution to DGP, please see [Contribution Guidelines](docs/CONTRIBUTING.md).

## CI Ecosystem
| Job | CI | Notes |
| --- | --- | --- |
| docker-build | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml) | Docker build and push to [container registry](https://github.com/TRI-ML/dgp/pkgs/container/dgp)|
| pre-merge | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/pre-merge.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/pre-merge.yml) | Pre-merge testing|
| doc-gen | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/doc-gen.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/doc-gen.yml) | [GitHub Pages](https://tri-ml.github.io/dgp/) doc generation|
| coverage | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/coverage.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/coverage.yml) | Code coverage metrics and badge generation |

| Job | CI | Notes |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| docker-build | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml) | Docker build and push to [container registry](https://github.com/TRI-ML/dgp/pkgs/container/dgp) |
| pre-merge | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/pre-merge.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/pre-merge.yml) | Pre-merge testing |
| doc-gen | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/doc-gen.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/doc-gen.yml) | [GitHub Pages](https://tri-ml.github.io/dgp/) doc generation |
| coverage | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/coverage.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/coverage.yml) | Code coverage metrics and badge generation |

## 💬 Where to file bug reports

| Type | Platforms |
| - | - |
| 🚨 **Bug Reports** | [GitHub Issue Tracker](https://github.com/TRI-ML/dgp/issues) |
| 🎁 **Feature Requests** | [GitHub Issue Tracker](https://github.com/TRI-ML/dgp/issues) |
| Type | Platforms |
| ----------------------- | ------------------------------------------------------------ |
| 🚨 **Bug Reports** | [GitHub Issue Tracker](https://github.com/TRI-ML/dgp/issues) |
| 🎁 **Feature Requests** | [GitHub Issue Tracker](https://github.com/TRI-ML/dgp/issues) |

## 👩‍💻 The Team 👨‍💻
## 👩‍💻 The Team 👨‍💻

DGP is developed and currently maintained by *Quincy Chen, Arjun Bhargava, Chao Fang, Chris Ochoa and Kuan-Hui Lee* from ML-Engineering team at [Toyota Research Institute (TRI)](https://www.tri.global/), with contributions coming from ML-Research team at TRI, [Woven Planet](https://www.woven-planet.global/en) and [Parallel Domain](https://paralleldomain.com/).
DGP is developed and currently maintained by _Quincy Chen, Arjun Bhargava, Chao
Fang, Chris Ochoa and Kuan-Hui Lee_ from ML-Engineering team at
[Toyota Research Institute (TRI)](https://www.tri.global/), with contributions
coming from ML-Research team at TRI,
[Woven Planet](https://www.woven-planet.global/en) and
[Parallel Domain](https://paralleldomain.com/).
62 changes: 37 additions & 25 deletions dgp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,70 +2,82 @@

[dgp/cli.py](cli.py) is the main CLI entrypoint for handling DGP datasets.


## Visualize DGP-compliant Scene and SceneDataset

DGP CLI subcommands `visualize-scenes` and `visualize-scene` can be used to visualize DGP-compliant data.

DGP CLI subcommands `visualize-scenes` and `visualize-scene` can be used to
visualize DGP-compliant data.

* To visualize a split of a **[DGP SceneDataset](proto/dataset.proto#L127)**, run `python dgp/cli.py visualize-scenes`:
- To visualize a split of a **[DGP SceneDataset](proto/dataset.proto#L127)**,
run `python dgp/cli.py visualize-scenes`:

Show the help message via:

```sh
dgp$ python dgp/cli.py visualize-scenes --help
```

Example command to visualize the images from `CAMERA_01, CAMERA_05, CAMERA_06` and point cloud from `LIDAR` along with ground_truth annotations `bounding_box_2d, bounding_box_3d` from `train` split of the toy dataset `tests/data/dgp/test_scene/scene_dataset_v1.0.json`, and store the resulting videos in `--dst-dir vis`.
One can find the resulting 3D visualization videos in `vis/3d` and 2D visualization videos in `vis/2d`.
Example command to visualize the images from `CAMERA_01, CAMERA_05, CAMERA_06`
and point cloud from `LIDAR` along with ground_truth annotations
`bounding_box_2d, bounding_box_3d` from `train` split of the toy dataset
`tests/data/dgp/test_scene/scene_dataset_v1.0.json`, and store the resulting
videos in `--dst-dir vis`. One can find the resulting 3D visualization videos
in `vis/3d` and 2D visualization videos in `vis/2d`.

```sh
dgp$ python dgp/cli.py visualize-scenes --scene-dataset-json tests/data/dgp/test_scene/scene_dataset_v1.0.json --split train --dst-dir vis -l LIDAR -c CAMERA_01 -c CAMERA_05 -c CAMERA_06 -a bounding_box_2d -a bounding_box_3d
```
<p align="center">
<img src="https://raw.githubusercontent.com/TRI-ML/dgp/master/docs/3d-viz.gif" alt="3d-viz"/>
</p>

<p align="center">
<img src="https://raw.githubusercontent.com/TRI-ML/dgp/master/docs/3d-viz.gif" alt="3d-viz"/>
</p>

Add flag `render-pointcloud` to render projected pointcloud onto images:
```sh
dgp$ python dgp/cli.py visualize-scenes --scene-dataset-json tests/data/dgp/test_scene/scene_dataset_v1.0.json --split train --dst-dir vis -l LIDAR -c CAMERA_01 -c CAMERA_05 -c CAMERA_06 -a bounding_box_2d -a bounding_box_3d --render-pointcloud
```

```sh
dgp$ python dgp/cli.py visualize-scenes --scene-dataset-json tests/data/dgp/test_scene/scene_dataset_v1.0.json --split train --dst-dir vis -l LIDAR -c CAMERA_01 -c CAMERA_05 -c CAMERA_06 -a bounding_box_2d -a bounding_box_3d --render-pointcloud
```

<p align="center">
<img src="https://raw.githubusercontent.com/TRI-ML/dgp/master/docs/3d-viz-proj.gif" alt="3d-viz-proj"/>
</p>


* To visualize a single **[DGP Scene](proto/scene.proto#L14)**, run `python dgp/cli.py visualize-scene`:
- To visualize a single **[DGP Scene](proto/scene.proto#L14)**, run
`python dgp/cli.py visualize-scene`:

Show the help message via:

```sh
dgp$ python dgp/cli.py visualize-scene --help
```

Example command to visualize the images from `CAMERA_01, CAMERA_05, CAMERA_06` and point cloud from `LIDAR` along with ground_truth annotations `bounding_box_2d, bounding_box_3d` from the toy Scene `tests/data/dgp/test_scene/scene_01/scene_a8dc5ed1da0923563f85ea129f0e0a83e7fe1867.json`, and store the resulting videos in `--dst-dir vis`.
One can find the resulting 3D visualization video in `vis/3d` and 2D visualization video in `vis/2d`.
Example command to visualize the images from `CAMERA_01, CAMERA_05, CAMERA_06`
and point cloud from `LIDAR` along with ground_truth annotations
`bounding_box_2d, bounding_box_3d` from the toy Scene
`tests/data/dgp/test_scene/scene_01/scene_a8dc5ed1da0923563f85ea129f0e0a83e7fe1867.json`,
and store the resulting videos in `--dst-dir vis`. One can find the resulting
3D visualization video in `vis/3d` and 2D visualization video in `vis/2d`.

```sh
dgp$ python dgp/cli.py visualize-scene --scene-json tests/data/dgp/test_scene/scene_01/scene_a8dc5ed1da0923563f85ea129f0e0a83e7fe1867.json --dst-dir vis -l LIDAR -c CAMERA_01 -c CAMERA_05 -c CAMERA_06 -a bounding_box_2d -a bounding_box_3d
```

## Coming soon: Retrieve information about an ML dataset in the DGP

DGP CLI provides information about a dataset, including
the remote location (S3 url) of the dataset, its raw dataset url, the
set of available annotation types contained in the dataset, etc. For
more information, see relevant metadata stored with a dataset artifact
in [DatasetMetadata](proto/dataset.proto) and [DatasetArtifacts](proto/artifacts.proto).
DGP CLI provides information about a dataset, including the remote location (S3
url) of the dataset, its raw dataset url, the set of available annotation types
contained in the dataset, etc. For more information, see relevant metadata
stored with a dataset artifact in [DatasetMetadata](proto/dataset.proto) and
[DatasetArtifacts](proto/artifacts.proto).

```sh
dgp$ python dgp/cli.py info --scene-dataset-json <scene-dataset-json>
```

## Coming soon: Validate a dataset

DGP CLI provides a simplified mechanism for validating newly
created datasets, ensuring that the dataset schema is maintained and
valid. This is done via:
DGP CLI provides a simplified mechanism for validating newly created datasets,
ensuring that the dataset schema is maintained and valid. This is done via:

```sh
dgp$ python dgp/cli.py validate --scene-dataset-json <scene-dataset-json>
```
```
Loading

0 comments on commit a2c6c12

Please sign in to comment.