From 879047ab471b1342acac70053b32096cbbbec610 Mon Sep 17 00:00:00 2001
From: omrishiv <327609+omrishiv@users.noreply.github.com>
Date: Wed, 11 Sep 2024 15:55:23 -0700
Subject: [PATCH] update docs

Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
---
 docs/source/getting_started/neuron-installation.rst | 4 ++--
 docs/source/index.rst                               | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/source/getting_started/neuron-installation.rst b/docs/source/getting_started/neuron-installation.rst
index 0816524468cab..ac90bdc04f69b 100644
--- a/docs/source/getting_started/neuron-installation.rst
+++ b/docs/source/getting_started/neuron-installation.rst
@@ -3,8 +3,8 @@
 Installation with Neuron
 ========================
 
-vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Inferentia with Neuron SDK.
-At the moment Paged Attention is not supported in Neuron SDK, but naive continuous batching is supported in transformers-neuronx.
+vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Inferentia with Neuron SDK with continuous batching.
+Paged Attention is currently in development and will be available soon.
 Data types currently supported in Neuron SDK are FP16 and BF16.
 
 Requirements
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 4b817c4ba9498..1a4994121ea81 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -43,7 +43,7 @@ vLLM is flexible and easy to use with:
 * Tensor parallelism and pipeline parallelism support for distributed inference
 * Streaming outputs
 * OpenAI-compatible API server
-* Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, TPU, and AWS Neuron.
+* Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, TPU, and AWS Trainium and Inferentia Accelerators.
 * Prefix caching support
 * Multi-lora support