E5 dataset #17

wangskyGit · 2024-03-20T06:29:56Z

Hello！ This is awesome work and the idea of using LLM as the embedding model is amazing. More importantly, you really did it and the performance is surprising good!
I am wondering do you plan to release the E5 synthetic dataset generated by GPT4? or what will the performance be like if we only leverage the open dataset?

Muennighoff · 2024-03-20T14:32:12Z

Unfortunately, we are unable to release the E5 dataset. We have released the MEDI2 dataset. The table in the screenshot from the paper gives you an idea of their performance difference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E5 dataset #17

E5 dataset #17

wangskyGit commented Mar 20, 2024

Muennighoff commented Mar 20, 2024

E5 dataset #17

E5 dataset #17

Comments

wangskyGit commented Mar 20, 2024

Muennighoff commented Mar 20, 2024