Skip to content

Commit

Permalink
docs: update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
dgcnz committed Oct 31, 2024
1 parent 052d4e1 commit f033741
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 40 deletions.
1 change: 1 addition & 0 deletions docs/src/part1/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ The project is structured as follows:

The main folders to focus are `src` and `scripts` as it is where most of the source code lies.

(part1:installation)=
## Installation

First make sure the (bold) pre-requirements are fulfilled:
Expand Down
36 changes: 33 additions & 3 deletions docs/src/part3/compilation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -718,6 +718,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"(part2:compilingmodel)=\n",
"## Compiling the model"
]
},
Expand All @@ -729,7 +730,7 @@
"\n",
"The main script to compile our model with the TensorRT backend is `scripts.export_tensorrt`.\n",
"\n",
"The easiest way to specify a compilation target, is by adding a config file at `scripts/config/export_tensorrt`. For example, if we want to compile our model's, we can use the config file located at `scripts/config/export_tensorrt/dinov2.yaml` as follows:\n",
"The easiest way to specify a compilation target, is by adding a config file at `scripts/config/export_tensorrt`. For example, if we want to compile our model, we can use the config file located at `scripts/config/export_tensorrt/dinov2.yaml` as follows:\n",
"\n",
"```sh\n",
"python -m scripts.export_tensorrt --config-name dinov2\n",
Expand All @@ -740,7 +741,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This config file specifies information such like:\n",
"Recommendations for the following parameters in the config file are given in *italics*:\n",
"- `image`: The sample image's file path, height and width. \n",
" - *Set to target camera dimensions*.\n",
"- `amp_dtype`: `fp16` or `bf16` for `torch.amp.autocast` usage, `fp32` to disable. \n",
Expand Down Expand Up @@ -798,6 +799,35 @@
"%pycat scripts/config/export_tensorrt/dinov2.yaml"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also override any of these parameters using the command line. For example, to compile the model with `torch.amp`'s `fp16` precision and TensorRT's `fp32`, `fp16` and `bf16` precisions, you can run:\n",
"\n",
"```bash\n",
"python -m scripts.export_tensorrt --config-name dinov2 amp_dtype=fp16 trt.enabled_precisions=\"[fp32, bf16, fp16]\" \n",
"```\n",
"\n",
"At the end of the compilation process, you should see a message indicating the output directory:\n",
"\n",
"```txt\n",
"OUTPUT DIR: outputs/2024-10-31/10-43-31\n",
"```\n",
"\n",
"This output directory will contain the following files:\n",
"\n",
"```txt\n",
"├── export_tensorrt.log # log file (useful for debugging process)\n",
"├── .hydra\n",
"│ ├── config.yaml # config file (useful for remembering the parameters used)\n",
"│ ├── hydra.yaml\n",
"│ └── overrides.yaml \n",
"├── model.ts # compiled torchscript model\n",
"└── predictions.png # sample predictions for the model (visual check that the model is working)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -808,7 +838,7 @@
"1. DinoV2 + ViTDet + DINO: Successful compilation, minimal final rewrites.\n",
"2. ViT + ViTDet + Cascade Mask RCNN: Almost successful, many final rewrites.\n",
"\n",
"To follow the thought process in a single notebook, I've added flags throughout the model's code to activate or deactivate the most important fixes. To see *all* the changes, you can check all the differences between my forks of `detectron2`, `detrex` and the original repositories."
"To follow the thought process in a single notebook, I've added flags throughout the model's source code to activate or deactivate the most important fixes. To see *all* the changes, you can check all the differences between my forks of `detectron2`, `detrex` and the original repositories."
]
},
{
Expand Down
73 changes: 36 additions & 37 deletions docs/src/part3/results.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,61 +5,53 @@

## Running the benchmarks

Download the model:
```bash
!wget https://huggingface.co/dgcnz/dinov2_vitdet_DINO_12ep/resolve/main/model_final.pth -O artifacts/model_final.pth ⁠
```
Before running the benchmarks, make sure you have downloaded the trained model (see {ref}`part1:downloadmodel`) and compiled it (see {ref}`part2:compilingmodel`).

Before running the benchmarks make sure you have compiled your desired model.
```bash
python -m scripts.export_tensorrt --config-name dinov2 amp_dtype=fp32 trt.enabled_precisions="[fp32, bf16, fp16]"
# ...
# OUTPUT DIR: outputs/2024-10-31/10-43-31
```
We'll assume that the output directory of `export_tensorrt`'compilation was `outputs/2024-10-31/10-43-31`.

The outputs of this script will be found in the directory specified by `OUTPUT DIR`. The directory will contain the following files:
There are three possible runtimes to benchmark, examples are shown below:

```
├── export_tensorrt.log # log file
├── .hydra
│ ├── config.yaml # config file
│ ├── hydra.yaml
│ └── overrides.yaml
├── model.ts # compiled torchscript model
└── predictions.png # sample predictions for the model
```
**Python Runtime, no TensorRT**

There are three possible runtimes to benchmark, examples of how to run the benchmarks are shown below:
This mode takes the uncompiled model and runs it with mixed precision (fp16 or bf16) or full precision (fp32).

**Python Runtime, no TensorRT**
```bash
python -m scripts.benchmark_gpu compile_run_path=outputs/2024-10-31/10-43-31 n_iter=100 load_ts=False amp_dtype=fp16
```

**Python Runtime with TensorRT**

```bash
python -m scripts.benchmark_gpu compile_run_path=outputs/2024-10-31/10-43-31 n_iter=100 load_ts=True
```

**C++ Runtime with TensorRT**

Make sure you have built the C++ runtime (see {ref}`part1:installation`).
```bash
./build/benchmark --model outputs/2024-10-31/10-43-31/model.ts --n_iter=100
```

## Results

Benchmarking was done on a NVIDIA RTX 4060 Ti GPU with 16GB of VRAM. Results are shown below.

**Python Runtime, no TensorRT**
```{table} **Python Runtime, no TensorRT**
:name: py_notrt
| model's precision | amp_dtype | latency (ms) |
| ----------------- | ---------------------- | -------------- |
| fp32 | fp32+fp16 | 66.322 ± 0.927 |
| fp32 | fp32+bf16 | 66.497 ± 1.052 |
| fp32 | fp32 | 76.275 ± 0.587 |
```

Max memory usage for all configurations is ~1GB.

**Python Runtime, with TensorRT**

```{table} **Python Runtime, with TensorRT**
:name: py_trt
| model's precision | trt.enabled_precisions | latency (ms) |
| ----------------- | ---------------------- | -------------- |
Expand All @@ -68,33 +60,40 @@ Max memory usage for all configurations is ~1GB.
| fp32 | fp32+bf16 | 25.148 ± 0.030 |
| fp32 | fp32 | 38.381 ± 0.022 |
```
Max memory usage for all configurations is ~500MB except for fp32+fp32 which is ~770MB.

**C++ Runtime, no TensorRT**

```{table} **C++ Runtime, with TensorRT**
:name: cpp_trt
| model's precision | trt.enabled_precisions | latency (ms) |
| ----------------- | ---------------------- | -------------- |
| fp32+fp16 | fp32+bf16+fp16 | 15.433 ± 0.029 |
| fp32 | fp32+bf16+fp16 | 23.263 ± 0.027 |
| fp32 | fp32+bf16 | 25.255 ± 0.014 |
| fp32 | fp32 | 38.465 ± 0.029 |

```

Max memory usage for all configurations is ~500MB except for fp32+fp32 which is ~770MB.

---
:::{note}

Note: For some reason in the latest version of torch_tensorrt, `bfloat16` precision is not working well and it's not achieving the previously measured performance of (13-14ms) and/or failing compilation.
For some unknown reason, `bfloat16` precision is not working well and it's not achieving the previously measured performance of (13-14ms) and/or failing compilation in the latest version of `torch_tensorrt`.

We include the previous results for completeness, in case the issue is resolved in the future.

| Runtime | model's precision | trt.enabled_precisions | latency | memory (mb) |
| ------- | ----------------- | ---------------------- | ------- | ----------- |
| cpp+trt | fp32 | fp32+fp16 | 13.984 | 500 |
| cpp+trt | fp32 | fp32+bf16+fp16 | 13.898 | 500 |
| cpp+trt | fp32 | fp32+bf16 | 17.261 | 500 |
| cpp+trt | bf16 | fp32+bf16 | 22.913 | 500 |
| cpp+trt | bf16 | bf16 | 22.938 | 500 |
| cpp+trt | fp32 | fp32 | 37.639 | 770 |

```{table} **C++ Runtime, with TensorRT (previous results)**
:name: cpp_trt_old
| model's precision | trt.enabled_precisions | latency |
| ----------------- | ---------------------- | ------- |
| fp32 | fp32+fp16 | 13.984 |
| fp32 | fp32+bf16+fp16 | 13.898 |
| fp32 | fp32+bf16 | 17.261 |
| bf16 | fp32+bf16 | 22.913 |
| bf16 | bf16 | 22.938 |
| fp32 | fp32 | 37.639 |
```

:::

0 comments on commit f033741

Please sign in to comment.