Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing stuck at encoding passages #355

Open
shubham526 opened this issue Jul 9, 2024 · 2 comments
Open

Indexing stuck at encoding passages #355

shubham526 opened this issue Jul 9, 2024 · 2 comments

Comments

@shubham526
Copy link

I have a huge collection of 116 million passages. I am trying to create a colbert index for them using the indexing code given on the README. To manage the huge size, I am indexing them in batches of 1000 passages. However, the indexing step seems to be stuck at the encoding stage:

2024-07-09 09:15:44,053 - INFO - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
[Jul 09, 09:15:50] [1]           #> Encoding 999 passages..
[Jul 09, 09:15:50] [0]           # of sampled PIDs = 2000        sampled_pids[:3] = [853, 1500, 20]
[Jul 09, 09:15:50] [0]           #> Encoding 1001 passages..

Is this supposed to take so much time? Not sure if I am doing something wrong.

@shubham526
Copy link
Author

@okhat

@okhat
Copy link
Collaborator

okhat commented Jul 9, 2024

It shouldn’t get stuck — so if it does that’s odd. But don’t index in 1000 passage batches. Index maybe 10 million at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants