Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory for larger datasets during attribution #191

Open
lsickert opened this issue Jun 3, 2023 · 9 comments
Open

CUDA out of memory for larger datasets during attribution #191

lsickert opened this issue Jun 3, 2023 · 9 comments
Labels
bug Something isn't working
Milestone

Comments

@lsickert
Copy link
Collaborator

lsickert commented Jun 3, 2023

🐛 Bug Report

When loading inseq with a larger dataset, on a CUDA device, an out-of-memory error is occurring regardless of the defined batch_size. I believe that is is caused by the call to self.encode inattribution_model.py lines 345 and 347, which is operating on the full inputs instead of a single batch and moves all inputs to the CUDA device after the encoding.

🔬 How To Reproduce

Steps to reproduce the behavior:

  1. Load any model without pre-generated targets
  2. Load a larger dataset with at least 1000 samples
  3. Call the .attribute() method with any batch_size parameter

Code sample

Environment

  • OS: macOS

  • Python version: 3.10

  • Inseq version: 0.4.0

Expected behavior

The input texts should ideally only be encoded or moved to the GPU once they are actually processed.

Additional context

@lsickert lsickert added the bug Something isn't working label Jun 3, 2023
@lsickert
Copy link
Collaborator Author

lsickert commented Jun 3, 2023

@gsarti Maybe you could try to confirm this behavior since I only seem to come across this when I run inseq on GPU.

@gsarti
Copy link
Member

gsarti commented Jun 3, 2023

Hi @lsickert, could you confirm that you face this bug when installing main? There was a bug where batching was not applied to model.generate causing this same issue that was fixed recently. Let me know!

@lsickert
Copy link
Collaborator Author

lsickert commented Jun 3, 2023

Hi @lsickert, could you confirm that you face this bug when installing main? There was a bug where batching was not applied to model.generate causing this same issue that was fixed recently. Let me know!

You are correct. Since I can only reproduce this on CUDA (Habrok cluster), I did not have the version from main but the latest published version installed. It seems to be the same issue as #183 and is mostly fixed when I run it with the current version.

I do still encounter some issues, though, when I try to run my full dataset (1m samples) with it at once (which I did not have in previous experiments since I was running inseq in a batched loop). So I think the calls to self.encode() that already move all the inputs to the GPU might still be a bit of an issue with large datasets.

Maybe it makes sense to only move them to the GPU during the generate function and later on during the attribution function when they are batched?

@gsarti
Copy link
Member

gsarti commented Jul 19, 2023

Documenting here another issue reported by @g8a9: when attributing a large set of examples at once it could happen that an example that is too large to fit the GPU memory will crash the entire attribution halfway.

A possible solution would be to have the option to sort inputs by length in terms of # of tokens in model.attribute (true by default). In this way, the top N (batch_size) longest examples will be attributed first, ensuring the process to either crash there or terminate later. Importantly, if sorting is activated, it should apply to all position-dependent fields (all parameters except step_scores where the argument is a list of strings)

@g8a9
Copy link
Contributor

g8a9 commented Jul 19, 2023

I'm not super keen on the idea of internal sorting by length as it could break some outer user-define logic. But I don't even know what's the best option here (maybe looking for the maximum length batch and running that first?)

Also, keep in mind that this dry run might take quite some time if sentences are very long -- so you might want not to repeat it if successful.

@gsarti
Copy link
Member

gsarti commented Jul 19, 2023

Yes, the idea of sorting by length was precisely aimed at avoiding the recomputation for the first big batch. As long as results are returned in the original order there shouldn't be a problem for user logics right?

@g8a9
Copy link
Contributor

g8a9 commented Jul 19, 2023

Yea if the original order is preserved I guess that works.

@lsickert
Copy link
Collaborator Author

Might it also be an option to catch these errors and either skip the relevant example (with a logged warning, of course) or move just this one to the CPU for processing and then continue with the next batch on GPU?

@gsarti
Copy link
Member

gsarti commented Jul 24, 2023

Moving to CPU seems the best option among the two, but I'm still not sure whether this should be preferable to raising an error at the start to signal that a lower batch size is required

@gsarti gsarti added this to the v0.5 milestone Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants