Feature Request: Support for NVEmbed #7746

christianazinn · 2024-06-04T16:35:53Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Attempting to python3 convert-hf-to-gguf.py with NVIDIA's latest NVEmbed model yields a NotImplementedError: Architecture 'NVEmbedModel' not supported! Add support for NVEmbedModel architecture.

Motivation

NVIDIA recently released their NVEmbed embeddings model based on the Mistral 7B decoder that ranks #1 on the MTEB leaderboard. It would be nice to see support for this in llama.cpp.

Possible Implementation

I'm not sure how different it would be than existing embeddings architectures. I'm aware other decoder-based models like SFR Embedding Mistral have GGUF quants which work, so I figure the NVEmbed model is structured similarly. Then it's mostly a matter of writing in a new model class for it in convert-hf-to-gguf.py.

The text was updated successfully, but these errors were encountered:

iamlemec · 2024-06-04T17:47:05Z

It looks like NVEmbed is basically Mistral but with non-causal attention and "latent attention" pooling. I hadn't seen latent attention pooling before, but judging from the modeling code on HF, it's just another attention layer on top of the last hidden states.

Right now in llama.cpp, we can tell causal-by-default models like Mistral to use non-causal attention. If we get #7477 merged, that will allow general pooling on these models. The only catch is we don't have latent pooling implemented, but it should be quite straightforward.

christianazinn · 2024-06-06T15:00:32Z

If we get #7477 merged, that will allow general pooling on these models. The only catch is we don't have latent pooling implemented, but it should be quite straightforward.

Thanks, will wait for that to be merged.

github-actions · 2024-07-21T01:07:12Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

camilovelezr · 2024-09-06T14:06:51Z

Hello!

Is this still on track to eventually being supported?

noahhaon · 2024-09-19T06:54:57Z

+1 here as well, NV-Embed-2 is top of the MTEB embeddings leaderboard:

https://huggingface.co/spaces/mteb/leaderboard

sakthi-geek · 2024-09-23T17:33:18Z

@iamlemec I can see #7477 is merged. Will we get NV-Embed V2 support soon?

christianazinn added the enhancement New feature or request label Jun 4, 2024

github-actions bot added the stale label Jul 7, 2024

github-actions bot closed this as completed Jul 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support for NVEmbed #7746

Feature Request: Support for NVEmbed #7746

christianazinn commented Jun 4, 2024

iamlemec commented Jun 4, 2024

christianazinn commented Jun 6, 2024

github-actions bot commented Jul 21, 2024

camilovelezr commented Sep 6, 2024

noahhaon commented Sep 19, 2024

sakthi-geek commented Sep 23, 2024

Feature Request: Support for NVEmbed #7746

Feature Request: Support for NVEmbed #7746

Comments

christianazinn commented Jun 4, 2024

Prerequisites

Feature Description

Motivation

Possible Implementation

iamlemec commented Jun 4, 2024

christianazinn commented Jun 6, 2024

github-actions bot commented Jul 21, 2024

camilovelezr commented Sep 6, 2024

noahhaon commented Sep 19, 2024

sakthi-geek commented Sep 23, 2024