Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a way to test models with GPU #684

Open
FynnBe opened this issue Oct 25, 2023 · 4 comments
Open

Find a way to test models with GPU #684

FynnBe opened this issue Oct 25, 2023 · 4 comments
Labels
good first issue Good for newcomers

Comments

@FynnBe
Copy link
Member

FynnBe commented Oct 25, 2023

We really have to test contributed models on CPU and GPU. Especially for Pytorch state dict models it is not guaranteed that it works on the other hardware if tested on one. (e.g. harcoded .cuda() method calls, etc..)

I believe @oeway has mentioned that there is a way to setup self-hosted github runners... and as ususal he seems to be right: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners

GH itself seems has GPU workflows on their radar as well: github/roadmap#505

@FynnBe FynnBe added the good first issue Good for newcomers label Oct 25, 2023
@jmetz
Copy link

jmetz commented Oct 25, 2023

Not sure if it's of interest or relevant, but Gitlab has (paid) SaaS GPU runners already (https://docs.gitlab.com/ee/ci/runners/saas/gpu_saas_runner.html)... though not sure if it would make any sense to call a Gitlab runner from Github.

@FynnBe
Copy link
Member Author

FynnBe commented Oct 25, 2023

Not sure if it's of interest or relevant, but Gitlab has (paid) SaaS GPU runners already (https://docs.gitlab.com/ee/ci/runners/saas/gpu_saas_runner.html)... though not sure if it would make any sense to call a Gitlab runner from Github.

We could of course switch to Gitlab alltogether... with the planned "S3 first" approach it is unclear what part of the collection stays on Github anyway

@oeway
Copy link
Contributor

oeway commented Oct 25, 2023

We really have to test contributed models on CPU and GPU. Especially for Pytorch state dict models it is not guaranteed that it works on the other hardware if tested on one. (e.g. harcoded .cuda() method calls, etc..)

I believe @oeway has mentioned that there is a way to setup self-hosted github runners... and as ususal he seems to be right: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners

GH itself seems has GPU workflows on their radar as well: github/roadmap#505

Hi Could you clarify a bit more on the problem? pytorch state dict wont' run or it doesn't reproduce the result? Cannot we simply provide some guidelines on how to properly export pytorch model so we can test it? Can we just say we only support torch script (amazingly it can also train / finetune) Having to run on GPU will associated with additional costs and maintainance burden.

Potentially we can just use the Triton on the BioEngine to test it with CPU and GPU, however, we don't really support pytorch state dict.

From security point of view, it's not safe to assign a paid GPU node to run arbitrary code on unaccepted PRs.

@FynnBe
Copy link
Member Author

FynnBe commented Oct 25, 2023

Hi Could you clarify a bit more on the problem? pytorch state dict wont' run or it doesn't reproduce the result?

I noticed one particular test case I would like us to cover: hardcoded device handling, e.g. `tensor.cuda(), model.to("cpu"), etc.
Just in general we cannot assume that a model runs on GPU if tested on CPU and vice versa.

Potentially we can just use the Triton on the BioEngine to test it with CPU and GPU, however, we don't really support pytorch state dict.

Yes exactly, I opened this issue just now to write exactly that! It should be a test instance. maybe with some stricter settings.
There gotta be a way to run pytorch state dict with bioimageio.core as well, no? It wouldn't even have to be performant.
I don't think this would entail high GPU costs either... we'd run very few samples per model for testing.

We could keep the CPU based tests in GH actions...

From security point of view, it's not safe to assign a paid GPU node to run arbitrary code on unaccepted PRs.

The security question was always a reason for zenodo...
addendum: We'll always run user-contributed code somewhere and I would not only want to rely on the manual code inspection. We need to investigate the legal side to come up with some terms and conditions etc that give us some legal security towards executing contributed code e.g. on GH.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants