You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Was there a specific command that was used to run the Llama 70B model? For example to do model-parallelism?
What GPU configuration did the authors use?
The text was updated successfully, but these errors were encountered:
Lately we have been using 2 A100 GPUs to run inference with the 70B model. It is also possible to use 6 GPUs with 32 GB of GPU memory. You might also be able to achieve reasonable performance with lower precision weights, which is not something we tested extensively.
Was there a specific command that was used to run the Llama 70B model? For example to do model-parallelism?
What GPU configuration did the authors use?
The text was updated successfully, but these errors were encountered: