docs: update docs

dgcnz · Oct 31, 2024 · f033741 · f033741
1 parent 052d4e1
commit f033741
Show file tree

Hide file tree

Showing 3 changed files with 70 additions and 40 deletions.
diff --git a/docs/src/part1/getting_started.md b/docs/src/part1/getting_started.md
@@ -33,6 +33,7 @@ The project is structured as follows:
 
 The main folders to focus are `src` and `scripts` as it is where most of the source code lies.
 
+(part1:installation)=
 ## Installation
 
 First make sure the (bold) pre-requirements are fulfilled:

diff --git a/docs/src/part3/compilation.ipynb b/docs/src/part3/compilation.ipynb
@@ -718,6 +718,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "(part2:compilingmodel)=\n",
     "## Compiling the model"
    ]
   },
@@ -729,7 +730,7 @@
     "\n",
     "The main script to compile our model with the TensorRT backend is `scripts.export_tensorrt`.\n",
     "\n",
-    "The easiest way to specify a compilation target, is by adding a config file at `scripts/config/export_tensorrt`. For example, if we want to compile our model's, we can use the config file located at `scripts/config/export_tensorrt/dinov2.yaml` as follows:\n",
+    "The easiest way to specify a compilation target, is by adding a config file at `scripts/config/export_tensorrt`. For example, if we want to compile our model, we can use the config file located at `scripts/config/export_tensorrt/dinov2.yaml` as follows:\n",
     "\n",
     "```sh\n",
     "python -m scripts.export_tensorrt --config-name dinov2\n",
@@ -740,7 +741,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This config file specifies information such like:\n",
+    "Recommendations for the following parameters in the config file are given in *italics*:\n",
     "- `image`: The sample image's file path, height and width. \n",
     "    - *Set to target camera dimensions*.\n",
     "- `amp_dtype`: `fp16` or `bf16` for `torch.amp.autocast` usage, `fp32` to disable. \n",
@@ -798,6 +799,35 @@
     "%pycat scripts/config/export_tensorrt/dinov2.yaml"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can also override any of these parameters using the command line. For example, to compile the model with `torch.amp`'s `fp16` precision and TensorRT's `fp32`, `fp16` and `bf16` precisions, you can run:\n",
+    "\n",
+    "```bash\n",
+    "python -m scripts.export_tensorrt --config-name dinov2 amp_dtype=fp16 trt.enabled_precisions=\"[fp32, bf16, fp16]\" \n",
+    "```\n",
+    "\n",
+    "At the end of the compilation process, you should see a message indicating the output directory:\n",
+    "\n",
+    "```txt\n",
+    "OUTPUT DIR: outputs/2024-10-31/10-43-31\n",
+    "```\n",
+    "\n",
+    "This output directory will contain the following files:\n",
+    "\n",
+    "```txt\n",
+    "├── export_tensorrt.log     # log file (useful for debugging process)\n",
+    "├── .hydra\n",
+    "│   ├── config.yaml         # config file (useful for remembering the parameters used)\n",
+    "│   ├── hydra.yaml\n",
+    "│   └── overrides.yaml      \n",
+    "├── model.ts                # compiled torchscript model\n",
+    "└── predictions.png         # sample predictions for the model (visual check that the model is working)\n",
+    "```"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -808,7 +838,7 @@
     "1. DinoV2 + ViTDet + DINO: Successful compilation, minimal final rewrites.\n",
     "2. ViT + ViTDet + Cascade Mask RCNN: Almost successful, many final rewrites.\n",
     "\n",
-    "To follow the thought process in a single notebook, I've added flags throughout the model's code to activate or deactivate the most important fixes. To see *all* the changes, you can check all the differences between my forks of `detectron2`, `detrex` and the original repositories."
+    "To follow the thought process in a single notebook, I've added flags throughout the model's source code to activate or deactivate the most important fixes. To see *all* the changes, you can check all the differences between my forks of `detectron2`, `detrex` and the original repositories."
    ]
   },
   {

diff --git a/docs/src/part3/results.md b/docs/src/part3/results.md
@@ -5,61 +5,53 @@
 
 ## Running the benchmarks
 
-Download the model:
-```bash
-!wget https://huggingface.co/dgcnz/dinov2_vitdet_DINO_12ep/resolve/main/model_final.pth -O artifacts/model_final.pth ⁠
-```
+Before running the benchmarks, make sure you have downloaded the trained model (see {ref}`part1:downloadmodel`) and compiled it (see {ref}`part2:compilingmodel`).
 
-Before running the benchmarks make sure you have compiled your desired model. 
-```bash
-python -m scripts.export_tensorrt --config-name dinov2 amp_dtype=fp32 trt.enabled_precisions="[fp32, bf16, fp16]" 
-# ...
-# OUTPUT DIR: outputs/2024-10-31/10-43-31
-```
+We'll assume that the output directory of `export_tensorrt`'compilation was `outputs/2024-10-31/10-43-31`.
 
-The outputs of this script will be found in the directory specified by `OUTPUT DIR`. The directory will contain the following files:
+There are three possible runtimes to benchmark, examples are shown below:
 
-```
-├── export_tensorrt.log     # log file
-├── .hydra
-│   ├── config.yaml         # config file
-│   ├── hydra.yaml
-│   └── overrides.yaml      
-├── model.ts                # compiled torchscript model
-└── predictions.png         # sample predictions for the model
-```
+**Python Runtime, no TensorRT**
 
-There are three possible runtimes to benchmark, examples of how to run the benchmarks are shown below:
+This mode takes the uncompiled model and runs it with mixed precision (fp16 or bf16) or full precision (fp32).
 
-**Python Runtime, no TensorRT**
 ```bash
 python -m scripts.benchmark_gpu compile_run_path=outputs/2024-10-31/10-43-31 n_iter=100 load_ts=False amp_dtype=fp16
 ```
 
 **Python Runtime with TensorRT**
+
 ```bash
 python -m scripts.benchmark_gpu compile_run_path=outputs/2024-10-31/10-43-31 n_iter=100 load_ts=True
 ```
 
 **C++ Runtime with TensorRT**
+
+Make sure you have built the C++ runtime (see {ref}`part1:installation`).
 ```bash
 ./build/benchmark --model outputs/2024-10-31/10-43-31/model.ts --n_iter=100
 ```
 
 ## Results
 
+Benchmarking was done on a NVIDIA RTX 4060 Ti GPU with 16GB of VRAM. Results are shown below.
 
-**Python Runtime, no TensorRT**
+```{table} **Python Runtime, no TensorRT**
+:name: py_notrt
 
 | model's precision | amp_dtype              | latency (ms)   |
 | ----------------- | ---------------------- | -------------- |
 | fp32              | fp32+fp16              | 66.322 ± 0.927 |
 | fp32              | fp32+bf16              | 66.497 ± 1.052 |
 | fp32              | fp32                   | 76.275 ± 0.587 |
 
+```
+
 Max memory usage for all configurations is ~1GB.
 
-**Python Runtime, with TensorRT**
+
+```{table} **Python Runtime, with TensorRT**
+:name: py_trt
 
 | model's precision | trt.enabled_precisions | latency (ms)   |
 | ----------------- | ---------------------- | -------------- |
@@ -68,33 +60,40 @@ Max memory usage for all configurations is ~1GB.
 | fp32              | fp32+bf16              | 25.148 ± 0.030 |
 | fp32              | fp32                   | 38.381 ± 0.022 |
 
+```
 Max memory usage for all configurations is ~500MB except for fp32+fp32 which is ~770MB.
 
-**C++ Runtime, no TensorRT**
+
+```{table} **C++ Runtime, with TensorRT**
+:name: cpp_trt
 
 | model's precision | trt.enabled_precisions | latency (ms)   |
 | ----------------- | ---------------------- | -------------- |
 | fp32+fp16         | fp32+bf16+fp16         | 15.433 ± 0.029 |
 | fp32              | fp32+bf16+fp16         | 23.263 ± 0.027 |
 | fp32              | fp32+bf16              | 25.255 ± 0.014 |
 | fp32              | fp32                   | 38.465 ± 0.029 |
-
+```
 
 Max memory usage for all configurations is ~500MB except for fp32+fp32 which is ~770MB.
 
----
+:::{note}
 
-Note: For some reason in the latest version of torch_tensorrt, `bfloat16` precision is not working well and it's not achieving the previously measured performance of (13-14ms) and/or failing compilation. 
+For some unknown reason, `bfloat16` precision is not working well and it's not achieving the previously measured performance of (13-14ms) and/or failing compilation in the latest version of `torch_tensorrt`.
 
 We include the previous results for completeness, in case the issue is resolved in the future.
 
-| Runtime | model's precision | trt.enabled_precisions | latency | memory (mb) |
-| ------- | ----------------- | ---------------------- | ------- | ----------- |
-| cpp+trt | fp32              | fp32+fp16              | 13.984  | 500         |
-| cpp+trt | fp32              | fp32+bf16+fp16         | 13.898  | 500         |
-| cpp+trt | fp32              | fp32+bf16              | 17.261  | 500         |
-| cpp+trt | bf16              | fp32+bf16              | 22.913  | 500         |
-| cpp+trt | bf16              | bf16                   | 22.938  | 500         |
-| cpp+trt | fp32              | fp32                   | 37.639  | 770         |
-
+```{table} **C++ Runtime, with TensorRT (previous results)**
+:name: cpp_trt_old
+
+| model's precision | trt.enabled_precisions | latency |
+| ----------------- | ---------------------- | ------- |
+| fp32              | fp32+fp16              | 13.984  |
+| fp32              | fp32+bf16+fp16         | 13.898  |
+| fp32              | fp32+bf16              | 17.261  |
+| bf16              | fp32+bf16              | 22.913  |
+| bf16              | bf16                   | 22.938  |
+| fp32              | fp32                   | 37.639  |
+```
 
+:::