feat: replace markdownlint with prettier (TRI-ML#123)

This also disables natural language validation since there is no corresponding pre-commit hook. We deem this check non-critical and in its place continue to enforce clear meaning in natural language in code reivews.
chrisochoatri · Aug 22, 2022 · a2c6c12 · a2c6c12
1 parent 707f2a2
commit a2c6c12
Show file tree

Hide file tree

Showing 10 changed files with 266 additions and 152 deletions.
diff --git a/.github/workflows/pre-merge.yml b/.github/workflows/pre-merge.yml
@@ -49,6 +49,8 @@ jobs:
           VALIDATE_HTML: false # Handled by pre-commit.
           VALIDATE_JSCPD: false
           VALIDATE_JSON: false # Handled by pre-commit.
+          VALIDATE_MARKDOWN: false # Handled by pre-commit.
+          VALIDATE_NATURAL_LANGUAGE: false # No corresponding pre-commit hook but not critical.
           VALIDATE_PROTOBUF: false
           VALIDATE_PYTHON_BLACK: false
           VALIDATE_PYTHON_FLAKE8: false # Formatted by YAPF.

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -33,6 +33,12 @@ repos:
     hooks:
       - id: isort
         exclude: _pb2\.py$
+  - repo: https://github.com/pre-commit/mirrors-prettier
+    rev: v2.7.1
+    hooks:
+      - id: prettier
+        types: [markdown]
+        args: [--embedded-language-formatting=off, --prose-wrap=always]
   - repo: https://github.com/PyCQA/pylint
     rev: v2.14.5
     hooks:

diff --git a/README.md b/README.md
@@ -1,38 +1,43 @@
 <!-- markdownlint-disable-next-line -->
-[<img src="docs/tri-logo.png" width="40%">](https://www.tri.global/)
 
+[<img src="docs/tri-logo.png" width="40%">](https://www.tri.global/)
 
 # Dataset Governance Policy (DGP)
+
 [![build-docker](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml)
 [![license](https://img.shields.io/github/license/TRI-ML/dgp.svg)](https://github.com/TRI-ML/dgp/blob/master/LICENSE)
 [![open-issues](https://img.shields.io/github/issues/TRI-ML/dgp.svg)](https://github.com/TRI-ML/dgp/issues)
 ![coverage badge](./docs/coverage.svg)
 [![docs](https://img.shields.io/badge/documentation-beta-red)](https://tri-ml.github.io/dgp/)
 
-To ensure the traceability, reproducibility and standardization for
-all ML datasets and models generated and consumed within Toyota Research Institute (TRI), we developed the
-Dataset-Governance-Policy (DGP) that codifies the schema and
-maintenance of all TRI's Autonomous Vehicle (AV) datasets.
+To ensure the traceability, reproducibility and standardization for all ML
+datasets and models generated and consumed within Toyota Research Institute
+(TRI), we developed the Dataset-Governance-Policy (DGP) that codifies the schema
+and maintenance of all TRI's Autonomous Vehicle (AV) datasets.
 
 <p align="center">
   <img src="docs/3d-viz-proj.gif" alt="3d-viz-proj"/>
 </p>
 
 ## Components
-- [Schema](dgp/proto/README.md): [Protobuf](https://developers.google.com/protocol-buffers)-based schemas for raw data, annotations
-  and dataset management.
-- [DataLoaders](dgp/datasets): Universal PyTorch DatasetClass to load all DGP-compliant datasets.
-- [CLI](dgp/README.md): Main CLI for handling DGP datasets and the entrypoint of visulization tools.
 
+- [Schema](dgp/proto/README.md):
+  [Protobuf](https://developers.google.com/protocol-buffers)-based schemas for
+  raw data, annotations and dataset management.
+- [DataLoaders](dgp/datasets): Universal PyTorch DatasetClass to load all
+  DGP-compliant datasets.
+- [CLI](dgp/README.md): Main CLI for handling DGP datasets and the entrypoint of
+  visulization tools.
 
 ## Getting Started
+
 Please see [Getting Started](docs/GETTING_STARTED.md) for environment setup.
 
-Getting started is as simple as initializing a dataset-class with the
-relevant dataset JSON, raw data sensor names, annotation types, and
-split information. Below, we show a few examples of initializing a
-Pytorch dataset for multi-modal learning from 2D bounding boxes, and
-3D bounding boxes.
+Getting started is as simple as initializing a dataset-class with the relevant
+dataset JSON, raw data sensor names, annotation types, and split information.
+Below, we show a few examples of initializing a Pytorch dataset for multi-modal
+learning from 2D bounding boxes, and 3D bounding boxes.
+
 ```python
 from dgp.datasets import SynchronizedSceneDataset
 
@@ -45,38 +50,49 @@ dataset = SynchronizedSceneDataset('<dataset_name>_v0.0.json',
 ```
 
 ## Examples
-A list of starter scripts are provided in the [examples](examples/)
-directory.
-- [examples/load_dataset.py](examples/load_dataset.py): Simple example
-  script to load a multi-modal dataset based on the **Getting
-  Started** section above.
+
+A list of starter scripts are provided in the [examples](examples/) directory.
+
+- [examples/load_dataset.py](examples/load_dataset.py): Simple example script to
+  load a multi-modal dataset based on the **Getting Started** section above.
 
 ## Build and run tests
-You can build the base docker image and run the tests within [docker container](docs/GETTING_STARTED.md#markdown-header-develop-within-docker)
+
+You can build the base docker image and run the tests within
+[docker container](docs/GETTING_STARTED.md#markdown-header-develop-within-docker)
 via:
+
 ```sh
 make docker-build
 make docker-run-tests
 ```
 
 ## Contributing
-We appreciate all contributions to DGP! To learn more about making a contribution to DGP, please see [Contribution Guidelines](docs/CONTRIBUTING.md).
+
+We appreciate all contributions to DGP! To learn more about making a
+contribution to DGP, please see [Contribution Guidelines](docs/CONTRIBUTING.md).
 
 ## CI Ecosystem
-| Job | CI | Notes |
-| --- | --- | --- |
-| docker-build | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml) | Docker build and push to [container registry](https://github.com/TRI-ML/dgp/pkgs/container/dgp)|
-| pre-merge    | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/pre-merge.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/pre-merge.yml) | Pre-merge testing|
-| doc-gen    | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/doc-gen.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/doc-gen.yml) | [GitHub Pages](https://tri-ml.github.io/dgp/) doc generation|
-| coverage    | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/coverage.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/coverage.yml) | Code coverage metrics and badge generation |
+
+| Job          | CI                                                                                                                                                              | Notes                                                                                           |
+| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
+| docker-build | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml) | Docker build and push to [container registry](https://github.com/TRI-ML/dgp/pkgs/container/dgp) |
+| pre-merge    | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/pre-merge.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/pre-merge.yml)       | Pre-merge testing                                                                               |
+| doc-gen      | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/doc-gen.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/doc-gen.yml)           | [GitHub Pages](https://tri-ml.github.io/dgp/) doc generation                                    |
+| coverage     | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/coverage.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/coverage.yml)         | Code coverage metrics and badge generation                                                      |
 
 ## 💬 Where to file bug reports
 
-| Type                     | Platforms                                              |
-| - | - |
-| 🚨 **Bug Reports**       | [GitHub Issue Tracker](https://github.com/TRI-ML/dgp/issues) |
-| 🎁 **Feature Requests**  | [GitHub Issue Tracker](https://github.com/TRI-ML/dgp/issues) |
+| Type                    | Platforms                                                    |
+| ----------------------- | ------------------------------------------------------------ |
+| 🚨 **Bug Reports**      | [GitHub Issue Tracker](https://github.com/TRI-ML/dgp/issues) |
+| 🎁 **Feature Requests** | [GitHub Issue Tracker](https://github.com/TRI-ML/dgp/issues) |
 
-## 👩‍💻  The Team 👨‍💻
+## 👩‍💻 The Team 👨‍💻
 
-DGP is developed and currently maintained by *Quincy Chen, Arjun Bhargava, Chao Fang, Chris Ochoa and Kuan-Hui Lee* from ML-Engineering team at [Toyota Research Institute (TRI)](https://www.tri.global/), with contributions coming from ML-Research team at TRI, [Woven Planet](https://www.woven-planet.global/en) and [Parallel Domain](https://paralleldomain.com/).
+DGP is developed and currently maintained by _Quincy Chen, Arjun Bhargava, Chao
+Fang, Chris Ochoa and Kuan-Hui Lee_ from ML-Engineering team at
+[Toyota Research Institute (TRI)](https://www.tri.global/), with contributions
+coming from ML-Research team at TRI,
+[Woven Planet](https://www.woven-planet.global/en) and
+[Parallel Domain](https://paralleldomain.com/).
diff --git a/dgp/README.md b/dgp/README.md
@@ -2,70 +2,82 @@
 
 [dgp/cli.py](cli.py) is the main CLI entrypoint for handling DGP datasets.
 
-
 ## Visualize DGP-compliant Scene and SceneDataset
 
-DGP CLI subcommands `visualize-scenes` and `visualize-scene` can be used to visualize DGP-compliant data.
-
+DGP CLI subcommands `visualize-scenes` and `visualize-scene` can be used to
+visualize DGP-compliant data.
 
-* To visualize a split of a **[DGP SceneDataset](proto/dataset.proto#L127)**, run `python dgp/cli.py visualize-scenes`:
+- To visualize a split of a **[DGP SceneDataset](proto/dataset.proto#L127)**,
+  run `python dgp/cli.py visualize-scenes`:
 
   Show the help message via:
 
   ```sh
   dgp$ python dgp/cli.py visualize-scenes --help
   ```
 
-  Example command to visualize the images from `CAMERA_01, CAMERA_05, CAMERA_06` and point cloud from `LIDAR` along with ground_truth annotations `bounding_box_2d, bounding_box_3d` from `train` split of the toy dataset `tests/data/dgp/test_scene/scene_dataset_v1.0.json`, and store the resulting videos in `--dst-dir vis`.
-One can find the resulting 3D visualization videos in `vis/3d` and 2D visualization videos in `vis/2d`.
+  Example command to visualize the images from `CAMERA_01, CAMERA_05, CAMERA_06`
+  and point cloud from `LIDAR` along with ground_truth annotations
+  `bounding_box_2d, bounding_box_3d` from `train` split of the toy dataset
+  `tests/data/dgp/test_scene/scene_dataset_v1.0.json`, and store the resulting
+  videos in `--dst-dir vis`. One can find the resulting 3D visualization videos
+  in `vis/3d` and 2D visualization videos in `vis/2d`.
 
   ```sh
   dgp$ python dgp/cli.py visualize-scenes --scene-dataset-json tests/data/dgp/test_scene/scene_dataset_v1.0.json --split train --dst-dir vis -l LIDAR -c CAMERA_01 -c CAMERA_05 -c CAMERA_06 -a bounding_box_2d -a bounding_box_3d
   ```
-<p align="center">
-  <img src="https://raw.githubusercontent.com/TRI-ML/dgp/master/docs/3d-viz.gif" alt="3d-viz"/>
-</p>
+
+  <p align="center">
+    <img src="https://raw.githubusercontent.com/TRI-ML/dgp/master/docs/3d-viz.gif" alt="3d-viz"/>
+  </p>
 
 Add flag `render-pointcloud` to render projected pointcloud onto images:
-  ```sh
-  dgp$ python dgp/cli.py visualize-scenes --scene-dataset-json tests/data/dgp/test_scene/scene_dataset_v1.0.json --split train --dst-dir vis -l LIDAR -c CAMERA_01 -c CAMERA_05 -c CAMERA_06 -a bounding_box_2d -a bounding_box_3d --render-pointcloud
-  ```
+
+```sh
+dgp$ python dgp/cli.py visualize-scenes --scene-dataset-json tests/data/dgp/test_scene/scene_dataset_v1.0.json --split train --dst-dir vis -l LIDAR -c CAMERA_01 -c CAMERA_05 -c CAMERA_06 -a bounding_box_2d -a bounding_box_3d --render-pointcloud
+```
+
 <p align="center">
   <img src="https://raw.githubusercontent.com/TRI-ML/dgp/master/docs/3d-viz-proj.gif" alt="3d-viz-proj"/>
 </p>
 
-
-* To visualize a single **[DGP Scene](proto/scene.proto#L14)**, run `python dgp/cli.py visualize-scene`:
+- To visualize a single **[DGP Scene](proto/scene.proto#L14)**, run
+  `python dgp/cli.py visualize-scene`:
 
   Show the help message via:
 
   ```sh
   dgp$ python dgp/cli.py visualize-scene --help
   ```
 
-  Example command to visualize the images from `CAMERA_01, CAMERA_05, CAMERA_06` and point cloud from `LIDAR` along with ground_truth annotations `bounding_box_2d, bounding_box_3d` from the toy Scene `tests/data/dgp/test_scene/scene_01/scene_a8dc5ed1da0923563f85ea129f0e0a83e7fe1867.json`, and store the resulting videos in `--dst-dir vis`.
-One can find the resulting 3D visualization video in `vis/3d` and 2D visualization video in `vis/2d`.
+  Example command to visualize the images from `CAMERA_01, CAMERA_05, CAMERA_06`
+  and point cloud from `LIDAR` along with ground_truth annotations
+  `bounding_box_2d, bounding_box_3d` from the toy Scene
+  `tests/data/dgp/test_scene/scene_01/scene_a8dc5ed1da0923563f85ea129f0e0a83e7fe1867.json`,
+  and store the resulting videos in `--dst-dir vis`. One can find the resulting
+  3D visualization video in `vis/3d` and 2D visualization video in `vis/2d`.
 
   ```sh
   dgp$ python dgp/cli.py visualize-scene --scene-json tests/data/dgp/test_scene/scene_01/scene_a8dc5ed1da0923563f85ea129f0e0a83e7fe1867.json --dst-dir vis -l LIDAR -c CAMERA_01 -c CAMERA_05 -c CAMERA_06 -a bounding_box_2d -a bounding_box_3d
   ```
 
 ## Coming soon: Retrieve information about an ML dataset in the DGP
 
-DGP CLI provides information about a dataset, including
-the remote location (S3 url) of the dataset, its raw dataset url, the
-set of available annotation types contained in the dataset, etc. For
-more information, see relevant metadata stored with a dataset artifact
-in [DatasetMetadata](proto/dataset.proto) and [DatasetArtifacts](proto/artifacts.proto).
+DGP CLI provides information about a dataset, including the remote location (S3
+url) of the dataset, its raw dataset url, the set of available annotation types
+contained in the dataset, etc. For more information, see relevant metadata
+stored with a dataset artifact in [DatasetMetadata](proto/dataset.proto) and
+[DatasetArtifacts](proto/artifacts.proto).
+
 ```sh
 dgp$ python dgp/cli.py info --scene-dataset-json <scene-dataset-json>
 ```
 
 ## Coming soon: Validate a dataset
 
-DGP CLI provides a simplified mechanism for validating newly
-created datasets, ensuring that the dataset schema is maintained and
-valid. This is done via:
+DGP CLI provides a simplified mechanism for validating newly created datasets,
+ensuring that the dataset schema is maintained and valid. This is done via:
+
 ```sh
 dgp$ python dgp/cli.py validate --scene-dataset-json <scene-dataset-json>
-```
+```