-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Llama support table + sharding docs (#915)
This PR adds a support table to our `llama_serving.md` guide, specifying our supported variants of `llama3.1-8b, llama3.1-70b, llama3.1-405b`. I used [vLLM Supported Models](https://docs.vllm.ai/en/v0.4.2/models/supported_models.html#supported-models) as a reference. I changed the structure a bit to make more sense for our server, but it's relatively similar to how they set their table up. For sharding instructions, I added a section at the end of our doc. I debated between creating a separate `md` file for it, but I think it actually makes sense where it is. It gives our doc a flow where we give detailed descriptions of what's going on while the user gets setup with the lowest-barrier model. Following that, you get to the more advanced sharding section. The details aren't as specific and it's not an exact copy + paste flow like above. The assumption is that after reading above section, the user shouldn't have to be hand-held as much in this section. Currently, I know we can shard 405b with an upper-bound of `tp8`. I need to run some tests to see what the supported lower-bound is. This also further highlights that we need to streamline our export/compile process. We should allow user to specify just the huggingface repo when starting the server, while we take care of downloading safetensors, exporting, and compiling. We should do this while still allowing for specific local files to be specified: #402, #691
- Loading branch information
Showing
1 changed file
with
155 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters