Skip to content

Commit aeceb46

Browse files
committed
move autodeploy doc into torch, update links
Signed-off-by: h-guo18 <[email protected]> Signed-off-by: Frida Hou <[email protected]>
1 parent 0d54263 commit aeceb46

File tree

10 files changed

+25
-54
lines changed

10 files changed

+25
-54
lines changed

docs/source/auto_deploy/advanced/mixed_precision_quantization.md

Lines changed: 0 additions & 17 deletions
This file was deleted.

docs/source/auto_deploy/advanced/model_eval.md

Lines changed: 0 additions & 11 deletions
This file was deleted.

docs/source/index.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ Welcome to TensorRT-LLM's Documentation!
1515
quick-start-guide.md
1616
key-features.md
1717
torch.md
18-
auto-deploy.md
1918
release-notes.md
2019

2120

docs/source/torch.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,7 @@ Here is a simple example to show how to use `tensorrt_llm.LLM` API with Llama mo
4040
## Known Issues
4141

4242
- The PyTorch backend on SBSA is incompatible with bare metal environments like Ubuntu 24.04. Please use the [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) for optimal support on SBSA platforms.
43+
44+
## Experimental Feature
45+
46+
- [AutoDeploy: Seamless Model Deployment from PyTorch to TRT-LLM](./torch/auto_deploy/auto-deploy.md)

docs/source/auto_deploy/advanced/example_run.md renamed to docs/source/torch/auto_deploy/advanced/example_run.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Example Run Script ([`build_and_run_ad.py`](./build_and_run_ad.py))
1+
# Example Run Script ([`build_and_run_ad.py`](../../../../../examples/auto_deploy/build_and_run_ad.py))
22

3-
To build and run AutoDeploy example, use the [`build_and_run_ad.py`](./build_and_run_ad.py) script:
3+
To build and run AutoDeploy example, use the [`build_and_run_ad.py`](../../../../../examples/auto_deploy/build_and_run_ad.py) script:
44

55
```bash
66
cd examples/auto_deploy
@@ -33,7 +33,7 @@ Below is a non-exhaustive list of common config options:
3333
| `--prompt.batch-size` | Number of queries to generate |
3434
| `--benchmark.enabled` | Whether to run the built-in benchmark (true/false) |
3535

36-
For default values and additional configuration options, refer to the [`ExperimentConfig`](./build_and_run_ad.py) class in [build_and_run_ad.py](./build_and_run_ad.py) file.
36+
For default values and additional configuration options, refer to the `ExperimentConfig` class in [build_and_run_ad.py](../../../../../examples/auto_deploy/build_and_run_ad.py) file.
3737

3838
Here is a more complete example of using the script:
3939

docs/source/auto_deploy/advanced/expert_configurations.md renamed to docs/source/torch/auto_deploy/advanced/expert_configurations.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,29 @@
11
# Expert Configuration of LLM API
22

3-
For expert TensorRT-LLM users, we also expose the full set of [`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
3+
For expert TensorRT-LLM users, we also expose the full set of [`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
44
*at your own risk* (the argument list diverges from TRT-LLM's argument list):
55

66
- All config fields that are used by the AutoDeploy core pipeline (i.e. the `InferenceOptimizer`) are
7-
_exclusively_ exposed in the [`AutoDeployConfig` class](../../tensorrt_llm/_torch/auto_deploy/llm_args.py).
7+
_exclusively_ exposed in the [`AutoDeployConfig` class](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py).
88
Please make sure to refer to those first.
9-
- For expert users we expose the full set of [`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
10-
that can be used to configure the [AutoDeploy `LLM` API](../../tensorrt_llm/_torch/auto_deploy/llm.py) including runtime options.
11-
- Note that some fields in the full [`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
9+
- For expert users we expose the full set of [`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
10+
that can be used to configure the [AutoDeploy `LLM` API](../../../../../tensorrt_llm/_torch/auto_deploy/llm.py) including runtime options.
11+
- Note that some fields in the full [`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
1212
object are overlapping, duplicated, and/or _ignored_ in AutoDeploy, particularly arguments
1313
pertaining to configuring the model itself since AutoDeploy's model ingestion+optimize pipeline
1414
significantly differs from the default manual workflow in TensorRT-LLM.
15-
- However, with the proper care the full [`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
15+
- However, with the proper care the full [`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
1616
objects can be used to configure advanced runtime options in TensorRT-LLM.
1717
- Note that any valid field can be simply provided as keyword argument ("`**kwargs`") to the
18-
[AutoDeploy `LLM` API](../../tensorrt_llm/_torch/auto_deploy/llm.py).
18+
[AutoDeploy `LLM` API](../../../../../tensorrt_llm/_torch/auto_deploy/llm.py).
1919

2020
# Expert Configuration of `build_and_run_ad.py`
2121

2222
For expert users, `build_and_run_ad.py` provides advanced configuration capabilities through a flexible argument parser powered by PyDantic Settings and OmegaConf. You can use dot notation for CLI arguments, provide multiple YAML configuration files, and leverage sophisticated configuration precedence rules to create complex deployment configurations.
2323

2424
## CLI Arguments with Dot Notation
2525

26-
The script supports flexible CLI argument parsing using dot notation to modify nested configurations dynamically. You can target any field in both the [`ExperimentConfig`](./build_and_run_ad.py) and nested [`AutoDeployConfig`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)/[`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.) objects:
26+
The script supports flexible CLI argument parsing using dot notation to modify nested configurations dynamically. You can target any field in both the [`ExperimentConfig`](../../../../../examples/auto_deploy/build_and_run_ad.py) and nested [`AutoDeployConfig`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)/[`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.) objects:
2727

2828
```bash
2929
# Configure model parameters
@@ -56,7 +56,7 @@ python build_and_run_ad.py \
5656

5757
## YAML Configuration Files
5858

59-
Both [`ExperimentConfig`](./build_and_run_ad.py) and [`AutoDeployConfig`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)/[`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py) inherit from [`DynamicYamlMixInForSettings`](../../tensorrt_llm/_torch/auto_deploy/utils/_config.py), enabling you to provide multiple YAML configuration files that are automatically deep-merged at runtime.
59+
Both [`ExperimentConfig`](../../../../../examples/auto_deploy/build_and_run_ad.py) and [`AutoDeployConfig`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)/[`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py) inherit from [`DynamicYamlMixInForSettings`](../../../../../tensorrt_llm/_torch/auto_deploy/utils/_config.py), enabling you to provide multiple YAML configuration files that are automatically deep-merged at runtime.
6060

6161
Create a YAML configuration file (e.g., `my_config.yaml`):
6262

@@ -167,7 +167,7 @@ python build_and_run_ad.py \
167167

168168
## Built-in Default Configuration
169169

170-
Both [`AutoDeployConfig`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py) and [`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py) classes automatically load a built-in [`default.yaml`](../../tensorrt_llm/_torch/auto_deploy/config/default.yaml) configuration file that provides sensible defaults for the AutoDeploy inference optimizer pipeline. This file is specified in the [`_get_config_dict()`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py) function and defines default transform configurations for graph optimization stages.
170+
Both [`AutoDeployConfig`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py) and [`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py) classes automatically load a built-in [`default.yaml`](../../../../../tensorrt_llm/_torch/auto_deploy/config/default.yaml) configuration file that provides sensible defaults for the AutoDeploy inference optimizer pipeline. This file is specified in the [`_get_config_dict()`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py) function and defines default transform configurations for graph optimization stages.
171171

172172
The built-in defaults are automatically merged with your configurations at the lowest priority level, ensuring that your custom settings always override the defaults. You can inspect the current default configuration to understand the baseline transform pipeline:
173173

File renamed without changes.

docs/source/auto_deploy/advanced/workflow.md renamed to docs/source/torch/auto_deploy/advanced/workflow.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,6 @@ llm = LLM(
2727
2828
```
2929

30-
Please consult the [AutoDeploy `LLM` API](../../tensorrt_llm/_torch/auto_deploy/llm.py) and the
31-
[`AutoDeployConfig` class](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
30+
Please consult the [AutoDeploy `LLM` API](../../../../../tensorrt_llm/_torch/auto_deploy/llm.py) and the
31+
[`AutoDeployConfig` class](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
3232
for more detail on how AutoDeploy is configured via the `**kwargs` of the `LLM` API.

docs/source/auto-deploy.md renamed to docs/source/torch/auto_deploy/auto-deploy.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ This project is in active development and is currently in an early (beta) stage.
99

1010
AutoDeploy is an experimental feature in beta stage designed to simplify and accelerate the deployment of PyTorch models, including off-the-shelf models like those from Hugging Face, to TensorRT-LLM. It automates graph transformations to integrate inference optimizations such as tensor parallelism, KV-caching and quantization. AutoDeploy supports optimized in-framework deployment, minimizing the amount of manual modification needed.
1111

12-
1312
## Motivation & Approach
1413

1514
Deploying large language models (LLMs) can be challenging, especially when balancing ease of use with high performance. Teams need simple, intuitive deployment solutions that reduce engineering effort, speed up the integration of new models, and support rapid experimentation without compromising performance.
@@ -34,7 +33,7 @@ AutoDeploy is accessible through TRT-LLM installation.
3433
sudo apt-get -y install libopenmpi-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm
3534
```
3635

37-
You can refer to [TRT-LLM installation guide](./installation/linux.md) for more information.
36+
You can refer to [TRT-LLM installation guide](../../installation/linux.md) for more information.
3837

3938
2. **Run Llama Example:**
4039

@@ -53,17 +52,14 @@ AutoDeploy streamlines the model deployment process through an automated workflo
5352

5453
The exported graph then undergoes a series of automated transformations, including graph sharding, KV-cache insertion, and GEMM fusion, to optimize model performance. After these transformations, the graph is compiled using one of the supported compile backends (like `torch-opt`), followed by deploying it via the TRT-LLM runtime.
5554

56-
- [Supported Matrix](./auto_deploy/support_matrix.md)
57-
55+
- [Supported Matrix](support_matrix.md)
5856

5957
## Advanced Usage
6058

61-
- [Example Run Script](./auto_deploy/advanced/example_run.md)
62-
- [Logging Level](./auto_deploy/advanced/logging.md)
63-
- [Model Evaluation with LM Evaluation Harness](./auto_deploy/advanced/model_eval.md)
64-
- [Mixed-precision Quantization using TensorRT Model Optimizer](./auto_deploy/advanced/mixed_precision_quantization.md)
65-
- [Incorporating auto_deploy into your own workflow](./auto_deploy/advanced/workflow.md)
66-
- [Expert Configurations](./auto_deploy/advanced/expert_configurations.md)
59+
- [Example Run Script](./advanced/example_run.md)
60+
- [Logging Level](./advanced/logging.md)
61+
- [Incorporating AutoDeploy into Your Own Workflow](./advanced/workflow.md)
62+
- [Expert Configurations](./advanced/expert_configurations.md)
6763

6864
## Roadmap
6965

File renamed without changes.

0 commit comments

Comments
 (0)