Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Usage of xmen index #18

Open
phlobo opened this issue Jul 26, 2023 · 2 comments
Open

Memory Usage of xmen index #18

phlobo opened this issue Jul 26, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@phlobo
Copy link
Member

phlobo commented Jul 26, 2023

Currently, the indexing procedure for SapBERT (embeddings + FAISS) is rather memory-intensive, as the complete FAISS index is constructed in memory. For large indices (e.g., Quaero) this can require > 32 GB of RAM.

A different RAM / accuracy tradeoff might be achieved in FAISS, but the impact on retrieval performance needs to be investigated.

@phlobo phlobo added the enhancement New feature or request label Nov 28, 2023
@NargesFarrokhshad
Copy link

Hello @phlobo

As creating FAISS index is memory-intensive and time-consuming why don't you use another similarity searcher with faster searchers like SCANN?

@phlobo
Copy link
Member Author

phlobo commented Dec 12, 2023

Thank you for your suggestion!

I don't think it's so much a problem of FAISS, but rather the current settings:
https://github.com/hpi-dhc/xmen/blob/main/xmen/linkers/faiss_indexer.py#L127

We will need to perform experiments of different settings and their effect on candidate retrieval performance.
This might could also mean that we replace FAISS (and maybe also NMSLib used by the TF-IDF linker) altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants