-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Support for NVEmbed #7746
Comments
It looks like Right now in |
Thanks, will wait for that to be merged. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Hello! Is this still on track to eventually being supported? |
+1 here as well, NV-Embed-2 is top of the MTEB embeddings leaderboard: |
Prerequisites
Feature Description
Attempting to
python3 convert-hf-to-gguf.py
with NVIDIA's latest NVEmbed model yields aNotImplementedError: Architecture 'NVEmbedModel' not supported!
Add support forNVEmbedModel
architecture.Motivation
NVIDIA recently released their NVEmbed embeddings model based on the Mistral 7B decoder that ranks #1 on the MTEB leaderboard. It would be nice to see support for this in llama.cpp.
Possible Implementation
I'm not sure how different it would be than existing embeddings architectures. I'm aware other decoder-based models like SFR Embedding Mistral have GGUF quants which work, so I figure the NVEmbed model is structured similarly. Then it's mostly a matter of writing in a new model class for it in
convert-hf-to-gguf.py
.The text was updated successfully, but these errors were encountered: