From ce99dbf49886fc7d7882ac6b38bc30e8993b7b48 Mon Sep 17 00:00:00 2001 From: Travis Addair Date: Fri, 15 Dec 2023 10:09:22 -0800 Subject: [PATCH] Added Mixtral and Phi to docs (#134) --- docs/models/adapters.md | 16 ++++++++++++++++ docs/models/base_models.md | 2 ++ 2 files changed, 18 insertions(+) diff --git a/docs/models/adapters.md b/docs/models/adapters.md index 9b452c42a..dcae3c15c 100644 --- a/docs/models/adapters.md +++ b/docs/models/adapters.md @@ -28,6 +28,14 @@ Any combination of linear layers can be targeted in the adapters, which correspo - `down_proj` - `lm_head` +### Mixtral + +- `q_proj` +- `k_proj` +- `v_proj` +- `o_proj` +- `lm_head` + ### Qwen - `c_attn` @@ -36,6 +44,14 @@ Any combination of linear layers can be targeted in the adapters, which correspo - `w2` - `lm_head` +### Phi + +- `Wqkv` +- `out_proj` +- `fc1` +- `fc2` +- `lm_head` + ### GPT2 - `c_attn` diff --git a/docs/models/base_models.md b/docs/models/base_models.md index 2ac55eea0..fab4a7101 100644 --- a/docs/models/base_models.md +++ b/docs/models/base_models.md @@ -6,7 +6,9 @@ - [CodeLlama](https://huggingface.co/codellama) - 🌬️[Mistral](https://huggingface.co/mistralai) - [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) +- 🔄 [Mixtral](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) - 🔮 [Qwen](https://huggingface.co/Qwen) +- 🏛️ [Phi](https://huggingface.co/microsoft/phi-2) - 🤖 [GPT2](https://huggingface.co/gpt2) Other architectures are supported on a best effort basis, but do not support dynamic adapter loading.