-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory
for larger datasets during attribution
#191
Comments
@gsarti Maybe you could try to confirm this behavior since I only seem to come across this when I run inseq on GPU. |
Hi @lsickert, could you confirm that you face this bug when installing |
You are correct. Since I can only reproduce this on CUDA (Habrok cluster), I did not have the version from I do still encounter some issues, though, when I try to run my full dataset (1m samples) with it at once (which I did not have in previous experiments since I was running inseq in a batched loop). So I think the calls to Maybe it makes sense to only move them to the GPU during the |
Documenting here another issue reported by @g8a9: when attributing a large set of examples at once it could happen that an example that is too large to fit the GPU memory will crash the entire attribution halfway. A possible solution would be to have the option to sort inputs by length in terms of # of tokens in |
I'm not super keen on the idea of internal sorting by length as it could break some outer user-define logic. But I don't even know what's the best option here (maybe looking for the maximum length batch and running that first?) Also, keep in mind that this dry run might take quite some time if sentences are very long -- so you might want not to repeat it if successful. |
Yes, the idea of sorting by length was precisely aimed at avoiding the recomputation for the first big batch. As long as results are returned in the original order there shouldn't be a problem for user logics right? |
Yea if the original order is preserved I guess that works. |
Might it also be an option to catch these errors and either skip the relevant example (with a logged warning, of course) or move just this one to the CPU for processing and then continue with the next batch on GPU? |
Moving to CPU seems the best option among the two, but I'm still not sure whether this should be preferable to raising an error at the start to signal that a lower batch size is required |
🐛 Bug Report
When loading inseq with a larger dataset, on a CUDA device, an out-of-memory error is occurring regardless of the defined
batch_size
. I believe that is is caused by the call toself.encode
inattribution_model.py
lines 345 and 347, which is operating on the full inputs instead of a single batch and moves all inputs to the CUDA device after the encoding.🔬 How To Reproduce
Steps to reproduce the behavior:
.attribute()
method with anybatch_size
parameterCode sample
Environment
OS: macOS
Python version: 3.10
Inseq version: 0.4.0
Expected behavior
The input texts should ideally only be encoded or moved to the GPU once they are actually processed.
Additional context
The text was updated successfully, but these errors were encountered: