NVIDIA
diff --git a/‎docs/source/auto_deploy/advanced/mixed_precision_quantization.md‎
Lines changed: 0 additions & 17 deletions b/‎docs/source/auto_deploy/advanced/mixed_precision_quantization.md‎
Lines changed: 0 additions & 17 deletions
diff --git a/‎docs/source/auto_deploy/advanced/model_eval.md‎
Lines changed: 0 additions & 11 deletions b/‎docs/source/auto_deploy/advanced/model_eval.md‎
Lines changed: 0 additions & 11 deletions
diff --git a/‎docs/source/index.rst‎
Lines changed: 0 additions & 1 deletion b/‎docs/source/index.rst‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎docs/source/torch.md‎
Lines changed: 4 additions & 0 deletions b/‎docs/source/torch.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/source/auto_deploy/advanced/example_run.md‎ renamed to ‎docs/source/torch/auto_deploy/advanced/example_run.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/source/auto_deploy/advanced/example_run.md‎ renamed to ‎docs/source/torch/auto_deploy/advanced/example_run.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/source/auto_deploy/advanced/expert_configurations.md‎ renamed to ‎docs/source/torch/auto_deploy/advanced/expert_configurations.md‎
Lines changed: 10 additions & 10 deletions b/‎docs/source/auto_deploy/advanced/expert_configurations.md‎ renamed to ‎docs/source/torch/auto_deploy/advanced/expert_configurations.md‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎docs/source/auto_deploy/advanced/logging.md‎ renamed to ‎docs/source/torch/auto_deploy/advanced/logging.md‎ b/‎docs/source/auto_deploy/advanced/logging.md‎ renamed to ‎docs/source/torch/auto_deploy/advanced/logging.md‎
diff --git a/‎docs/source/auto_deploy/advanced/workflow.md‎ renamed to ‎docs/source/torch/auto_deploy/advanced/workflow.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/auto_deploy/advanced/workflow.md‎ renamed to ‎docs/source/torch/auto_deploy/advanced/workflow.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source/auto-deploy.md‎ renamed to ‎docs/source/torch/auto_deploy/auto-deploy.md‎
Lines changed: 6 additions & 10 deletions b/‎docs/source/auto-deploy.md‎ renamed to ‎docs/source/torch/auto_deploy/auto-deploy.md‎
Lines changed: 6 additions & 10 deletions
diff --git a/‎docs/source/auto_deploy/support_matrix.md‎ renamed to ‎docs/source/torch/auto_deploy/support_matrix.md‎ b/‎docs/source/auto_deploy/support_matrix.md‎ renamed to ‎docs/source/torch/auto_deploy/support_matrix.md‎
@@ -15,7 +15,6 @@ Welcome to TensorRT-LLM's Documentation!
    quick-start-guide.md
    key-features.md
    torch.md
-   auto-deploy.md
    release-notes.md
 
 
 
@@ -40,3 +40,7 @@ Here is a simple example to show how to use `tensorrt_llm.LLM` API with Llama mo
 ## Known Issues
 
 - The PyTorch backend on SBSA is incompatible with bare metal environments like Ubuntu 24.04. Please use the [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) for optimal support on SBSA platforms.
+
+## Experimental Feature
+
+- [AutoDeploy: Seamless Model Deployment from PyTorch to TRT-LLM](./torch/auto_deploy/auto-deploy.md)
@@ -1,6 +1,6 @@
-# Example Run Script ([`build_and_run_ad.py`](./build_and_run_ad.py))
+# Example Run Script ([`build_and_run_ad.py`](../../../../../examples/auto_deploy/build_and_run_ad.py))
 
-To build and run AutoDeploy example, use the [`build_and_run_ad.py`](./build_and_run_ad.py) script:
+To build and run AutoDeploy example, use the [`build_and_run_ad.py`](../../../../../examples/auto_deploy/build_and_run_ad.py) script:
 
 ```bash
 cd examples/auto_deploy
@@ -33,7 +33,7 @@ Below is a non-exhaustive list of common config options:
 | `--prompt.batch-size` | Number of queries to generate |
 | `--benchmark.enabled` | Whether to run the built-in benchmark (true/false) |
 
-For default values and additional configuration options, refer to the [`ExperimentConfig`](./build_and_run_ad.py) class in [build_and_run_ad.py](./build_and_run_ad.py) file.
+For default values and additional configuration options, refer to the `ExperimentConfig` class in [build_and_run_ad.py](../../../../../examples/auto_deploy/build_and_run_ad.py) file.
 
 Here is a more complete example of using the script:
 
 
@@ -1,29 +1,29 @@
 # Expert Configuration of LLM API
 
-For expert TensorRT-LLM users, we also expose the full set of [`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
+For expert TensorRT-LLM users, we also expose the full set of [`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
 *at your own risk* (the argument list diverges from TRT-LLM's argument list):
 
 - All config fields that are used by the AutoDeploy core pipeline (i.e. the `InferenceOptimizer`) are
-  _exclusively_ exposed in the [`AutoDeployConfig` class](../../tensorrt_llm/_torch/auto_deploy/llm_args.py).
+  _exclusively_ exposed in the [`AutoDeployConfig` class](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py).
   Please make sure to refer to those first.
-- For expert users we expose the full set of [`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
-  that can be used to configure the [AutoDeploy `LLM` API](../../tensorrt_llm/_torch/auto_deploy/llm.py) including runtime options.
-- Note that some fields in the full [`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
+- For expert users we expose the full set of [`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
+  that can be used to configure the [AutoDeploy `LLM` API](../../../../../tensorrt_llm/_torch/auto_deploy/llm.py) including runtime options.
+- Note that some fields in the full [`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
   object are overlapping, duplicated, and/or _ignored_ in AutoDeploy, particularly arguments
   pertaining to configuring the model itself since AutoDeploy's model ingestion+optimize pipeline
   significantly differs from the default manual workflow in TensorRT-LLM.
-- However, with the proper care the full [`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
+- However, with the proper care the full [`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
   objects can be used to configure advanced runtime options in TensorRT-LLM.
 - Note that any valid field can be simply provided as keyword argument ("`**kwargs`") to the
-  [AutoDeploy `LLM` API](../../tensorrt_llm/_torch/auto_deploy/llm.py).
+  [AutoDeploy `LLM` API](../../../../../tensorrt_llm/_torch/auto_deploy/llm.py).
 
 # Expert Configuration of `build_and_run_ad.py`
 
 For expert users, `build_and_run_ad.py` provides advanced configuration capabilities through a flexible argument parser powered by PyDantic Settings and OmegaConf. You can use dot notation for CLI arguments, provide multiple YAML configuration files, and leverage sophisticated configuration precedence rules to create complex deployment configurations.
 
 ## CLI Arguments with Dot Notation
 
-The script supports flexible CLI argument parsing using dot notation to modify nested configurations dynamically. You can target any field in both the [`ExperimentConfig`](./build_and_run_ad.py) and nested [`AutoDeployConfig`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)/[`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.) objects:
+The script supports flexible CLI argument parsing using dot notation to modify nested configurations dynamically. You can target any field in both the [`ExperimentConfig`](../../../../../examples/auto_deploy/build_and_run_ad.py) and nested [`AutoDeployConfig`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)/[`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.) objects:
 
 ```bash
 # Configure model parameters
@@ -56,7 +56,7 @@ python build_and_run_ad.py \
 
 ## YAML Configuration Files
 
-Both [`ExperimentConfig`](./build_and_run_ad.py) and [`AutoDeployConfig`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)/[`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py) inherit from [`DynamicYamlMixInForSettings`](../../tensorrt_llm/_torch/auto_deploy/utils/_config.py), enabling you to provide multiple YAML configuration files that are automatically deep-merged at runtime.
+Both [`ExperimentConfig`](../../../../../examples/auto_deploy/build_and_run_ad.py) and [`AutoDeployConfig`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)/[`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py) inherit from [`DynamicYamlMixInForSettings`](../../../../../tensorrt_llm/_torch/auto_deploy/utils/_config.py), enabling you to provide multiple YAML configuration files that are automatically deep-merged at runtime.
 
 Create a YAML configuration file (e.g., `my_config.yaml`):
 
@@ -167,7 +167,7 @@ python build_and_run_ad.py \
 
 ## Built-in Default Configuration
 
-Both [`AutoDeployConfig`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py) and [`LlmArgs`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py) classes automatically load a built-in [`default.yaml`](../../tensorrt_llm/_torch/auto_deploy/config/default.yaml) configuration file that provides sensible defaults for the AutoDeploy inference optimizer pipeline. This file is specified in the [`_get_config_dict()`](../../tensorrt_llm/_torch/auto_deploy/llm_args.py) function and defines default transform configurations for graph optimization stages.
+Both [`AutoDeployConfig`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py) and [`LlmArgs`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py) classes automatically load a built-in [`default.yaml`](../../../../../tensorrt_llm/_torch/auto_deploy/config/default.yaml) configuration file that provides sensible defaults for the AutoDeploy inference optimizer pipeline. This file is specified in the [`_get_config_dict()`](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py) function and defines default transform configurations for graph optimization stages.
 
 The built-in defaults are automatically merged with your configurations at the lowest priority level, ensuring that your custom settings always override the defaults. You can inspect the current default configuration to understand the baseline transform pipeline:
 
 
@@ -27,6 +27,6 @@ llm = LLM(
 
 ```
 
-Please consult the [AutoDeploy `LLM` API](../../tensorrt_llm/_torch/auto_deploy/llm.py) and the
-[`AutoDeployConfig` class](../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
+Please consult the [AutoDeploy `LLM` API](../../../../../tensorrt_llm/_torch/auto_deploy/llm.py) and the
+[`AutoDeployConfig` class](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py)
 for more detail on how AutoDeploy is configured via the `**kwargs` of the `LLM` API.
@@ -9,7 +9,6 @@ This project is in active development and is currently in an early (beta) stage.
 
 AutoDeploy is an experimental feature in beta stage designed to simplify and accelerate the deployment of PyTorch models, including off-the-shelf models like those from Hugging Face, to TensorRT-LLM. It automates graph transformations to integrate inference optimizations such as tensor parallelism, KV-caching and quantization. AutoDeploy supports optimized in-framework deployment, minimizing the amount of manual modification needed.
 
-
 ## Motivation & Approach
 
 Deploying large language models (LLMs) can be challenging, especially when balancing ease of use with high performance. Teams need simple, intuitive deployment solutions that reduce engineering effort, speed up the integration of new models, and support rapid experimentation without compromising performance.
@@ -34,7 +33,7 @@ AutoDeploy is accessible through TRT-LLM installation.
 sudo apt-get -y install libopenmpi-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm
 ```
 
-You can refer to [TRT-LLM installation guide](./installation/linux.md) for more information.
+You can refer to [TRT-LLM installation guide](../../installation/linux.md) for more information.
 
 2. **Run Llama Example:**
 
@@ -53,17 +52,14 @@ AutoDeploy streamlines the model deployment process through an automated workflo
 
 The exported graph then undergoes a series of automated transformations, including graph sharding, KV-cache insertion, and GEMM fusion, to optimize model performance. After these transformations, the graph is compiled using one of the supported compile backends (like `torch-opt`), followed by deploying it via the TRT-LLM runtime.
 
-- [Supported Matrix](./auto_deploy/support_matrix.md)
-
+- [Supported Matrix](support_matrix.md)
 
 ## Advanced Usage
 
-- [Example Run Script](./auto_deploy/advanced/example_run.md)
-- [Logging Level](./auto_deploy/advanced/logging.md)
-- [Model Evaluation with LM Evaluation Harness](./auto_deploy/advanced/model_eval.md)
-- [Mixed-precision Quantization using TensorRT Model Optimizer](./auto_deploy/advanced/mixed_precision_quantization.md)
-- [Incorporating auto_deploy into your own workflow](./auto_deploy/advanced/workflow.md)
-- [Expert Configurations](./auto_deploy/advanced/expert_configurations.md)
+- [Example Run Script](./advanced/example_run.md)
+- [Logging Level](./advanced/logging.md)
+- [Incorporating AutoDeploy into Your Own Workflow](./advanced/workflow.md)
+- [Expert Configurations](./advanced/expert_configurations.md)
 
 ## Roadmap