Find a way to test models with GPU #684

FynnBe · 2023-10-25T15:17:38Z

We really have to test contributed models on CPU and GPU. Especially for Pytorch state dict models it is not guaranteed that it works on the other hardware if tested on one. (e.g. harcoded .cuda() method calls, etc..)

I believe @oeway has mentioned that there is a way to setup self-hosted github runners... and as ususal he seems to be right: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners

GH itself seems has GPU workflows on their radar as well: github/roadmap#505

The text was updated successfully, but these errors were encountered:

jmetz · 2023-10-25T15:24:53Z

Not sure if it's of interest or relevant, but Gitlab has (paid) SaaS GPU runners already (https://docs.gitlab.com/ee/ci/runners/saas/gpu_saas_runner.html)... though not sure if it would make any sense to call a Gitlab runner from Github.

FynnBe · 2023-10-25T15:32:49Z

Not sure if it's of interest or relevant, but Gitlab has (paid) SaaS GPU runners already (https://docs.gitlab.com/ee/ci/runners/saas/gpu_saas_runner.html)... though not sure if it would make any sense to call a Gitlab runner from Github.

We could of course switch to Gitlab alltogether... with the planned "S3 first" approach it is unclear what part of the collection stays on Github anyway

oeway · 2023-10-25T15:34:25Z

We really have to test contributed models on CPU and GPU. Especially for Pytorch state dict models it is not guaranteed that it works on the other hardware if tested on one. (e.g. harcoded .cuda() method calls, etc..)

I believe @oeway has mentioned that there is a way to setup self-hosted github runners... and as ususal he seems to be right: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners

GH itself seems has GPU workflows on their radar as well: github/roadmap#505

Hi Could you clarify a bit more on the problem? pytorch state dict wont' run or it doesn't reproduce the result? Cannot we simply provide some guidelines on how to properly export pytorch model so we can test it? Can we just say we only support torch script (amazingly it can also train / finetune) Having to run on GPU will associated with additional costs and maintainance burden.

Potentially we can just use the Triton on the BioEngine to test it with CPU and GPU, however, we don't really support pytorch state dict.

From security point of view, it's not safe to assign a paid GPU node to run arbitrary code on unaccepted PRs.

FynnBe · 2023-10-25T19:28:00Z

Hi Could you clarify a bit more on the problem? pytorch state dict wont' run or it doesn't reproduce the result?

I noticed one particular test case I would like us to cover: hardcoded device handling, e.g. `tensor.cuda(), model.to("cpu"), etc.
Just in general we cannot assume that a model runs on GPU if tested on CPU and vice versa.

Potentially we can just use the Triton on the BioEngine to test it with CPU and GPU, however, we don't really support pytorch state dict.

Yes exactly, I opened this issue just now to write exactly that! It should be a test instance. maybe with some stricter settings.
There gotta be a way to run pytorch state dict with bioimageio.core as well, no? It wouldn't even have to be performant.
I don't think this would entail high GPU costs either... we'd run very few samples per model for testing.

We could keep the CPU based tests in GH actions...

From security point of view, it's not safe to assign a paid GPU node to run arbitrary code on unaccepted PRs.

The security question was always a reason for zenodo...
addendum: We'll always run user-contributed code somewhere and I would not only want to rely on the manual code inspection. We need to investigate the legal side to come up with some terms and conditions etc that give us some legal security towards executing contributed code e.g. on GH.

FynnBe added the good first issue Good for newcomers label Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find a way to test models with GPU #684

Find a way to test models with GPU #684

FynnBe commented Oct 25, 2023

jmetz commented Oct 25, 2023

FynnBe commented Oct 25, 2023

oeway commented Oct 25, 2023

FynnBe commented Oct 25, 2023 •

edited

Loading

Find a way to test models with GPU #684

Find a way to test models with GPU #684

Comments

FynnBe commented Oct 25, 2023

jmetz commented Oct 25, 2023

FynnBe commented Oct 25, 2023

oeway commented Oct 25, 2023

FynnBe commented Oct 25, 2023 • edited Loading

FynnBe commented Oct 25, 2023 •

edited

Loading