Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint.docFromText is broken for bsize=None #366

Open
jbellis opened this issue Sep 17, 2024 · 1 comment
Open

Checkpoint.docFromText is broken for bsize=None #366

jbellis opened this issue Sep 17, 2024 · 1 comment

Comments

@jbellis
Copy link

jbellis commented Sep 17, 2024

It looks like the bsize=None path hasn't been exercised in some time. At the very least it is broken for keep_dims="flatten", because self.doc expects "flatten" to be transformed into "return_mask", and that only happens on the bsize path. It also looks like the rest of the processing in the bsize path is also important for the return value to be transformed correctly but I have not looked more deeply into that.

Here is the exception for the keep_dims issue:

  File "/home/jonathan/Projects/colbert-live/colbert_live/colbert_live.py", line 98, in encode_chunk
    embeddings = self._cp.docFromText([content], keep_dims="flatten", pool_factor=None)
  File "/home/jonathan/miniforge3/envs/colbert-live/lib/python3.10/site-packages/colbert/modeling/checkpoint.py", line 192, in docFromText
    return self.doc(input_ids, attention_mask, keep_dims=keep_dims, to_cpu=to_cpu)
  File "/home/jonathan/miniforge3/envs/colbert-live/lib/python3.10/site-packages/colbert/modeling/checkpoint.py", line 92, in doc
    D = super().doc(*args, **kw_args)
  File "/home/jonathan/miniforge3/envs/colbert-live/lib/python3.10/site-packages/colbert/modeling/colbert.py", line 96, in doc
    assert keep_dims in [True, False, 'return_mask']
@jbellis
Copy link
Author

jbellis commented Sep 22, 2024

More generally, the options in docFromText don't all play nice together. Eg., pool_factor is ignored unless keep_dims='flatten'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant