Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load docs and create index with instantiation of reranker instead of with each rerank. #19

Open
will-fairsupply opened this issue Jun 28, 2024 · 6 comments

Comments

@will-fairsupply
Copy link

Mostly thinking about this in the context of making the colbert reranker more useful... it would be great to be able to load the docs and create the index when you instantiate the reranker. This prevents repeating that computation each time we wish to produce a result.

I am happy to take this up, but wanted to raise it as an issue and see if it fits within the spirit of the project.

I do think that extending the colbert ranker (and others) in the way I describe will allow a broader use case.

@bclavie
Copy link
Collaborator

bclavie commented Jul 8, 2024

Hey! Thank you for the feedback & suggestion.

I agree with this, it'd be pretty useful for the ColBERT-reranker. I've tried to make the lib very lightweight and specific reranker-agnostic right now, mostly due to limited development time on my end, but I have nothing against such extensions -- this would be a more than welcome addition.

My only requirements would be that:

  • We don't add any unnecessary dependencies to the existing instal options, so any external indexing mechanism should be its own specific additional install
  • The code itself stays very contained, so that someone could still use the "raw" ColBERT reranker without needing to import whatever the index code needs. It could be in a different ColBERTreranker file, which the user could choose to use with an extra kwarg (colbert_keep_index=True?)

Depending on the number of documents, we might not need any sort of indexing -- keeping them in memory could be fine, though it'd balloon up quite fast.

@w-v-r
Copy link

w-v-r commented Jul 10, 2024

Switching from work github profile to personal github profile. Will submit from this.

Great, I've got some familiarity with some common indexing mechanisms. I will put something together and submit a pull request for review.

If I can make something that can be reranker agnostic and be used with a number of options I will. Failing that, I will do something that is ColBERT specific and am happy to contribute more if it seems like a good idea.

@stevoslates
Copy link

Did this ever happen, I was going to ask the same thing!

@bclavie
Copy link
Collaborator

bclavie commented Aug 6, 2024

Did this ever happen, I was going to ask the same thing!

Not yet! In case it doesn't get picked up as a PR, it is on my to-do list, but RAGatouille is taking the first-class citizen spot for a while in terms of open source projects (it badly needs an overhaul!), so it might be a while.

@w-v-r
Copy link

w-v-r commented Aug 8, 2024

Yes, not yet, but I will pick this up over the weekend. I'll have a branch for review and would appreciate feedback following that.

Thanks!

@bclavie
Copy link
Collaborator

bclavie commented Nov 4, 2024

Hey @w-v-r, any updates on this? No worries if you're no longer able to dedicate time to this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants