Llama 3.1 405B on Gaudi #646

ppatel-eng · 2024-12-18T19:26:39Z

ppatel-eng
Dec 18, 2024

We are trying to run Llama 3.1 405B on Gaudi and are running into memory constraints when following the guide below on 8 Gaudi 2 HPUs. Our end goal is to use vllm-fork to serve llama 3.1 405b, ideally with as little quantization as possible.

https://github.com/HabanaAI/vllm-hpu-extension/blob/main/calibration/README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.1 405B on Gaudi #646

{{title}}

Replies: 0 comments

Select a reply

Llama 3.1 405B on Gaudi #646

ppatel-eng Dec 18, 2024

Replies: 0 comments

ppatel-eng
Dec 18, 2024