Fixes to pyro model initialisation & sampling [WIP] #2695

vitkl · 2024-04-07T23:53:33Z

Addresses #2616

Replaces #1805

vitkl · 2024-04-08T01:07:41Z

I don't fully understand the reason for the errors - they don't happen in test_pyro_bayesian_regression_low_level, test_pyro_bayesian_regression, test_pyro_bayesian_regression_jit - but they happen when using train() directly. This approach works for cell2location.

The difference maybe the timing when the plates are first used. I will look into this later.

vitkl · 2024-04-08T01:09:09Z

Also this code for posterior sampling is indeed ~2-3x faster but it creates samples of huge observed data matrixes (copies data n_samples times - eg 1000):

        if isinstance(self.module.guide, poutine.messenger.Messenger):
            # This already includes trace-replay behavior.
            sample = self.module.guide(*args, **kwargs)

An alternative way to deal with this issue would be this:

        if isinstance(self.module.guide, poutine.messenger.Messenger):
            # This already includes trace-replay behavior.
            sample = self.module.guide(*args, **kwargs)
            # include and exclude requested sites
            sample = {k: v for k, v in sample.items() if k in return_sites}
            sample = {k: v for k, v in sample.items() if k not in exclude_vars}   # this has to be provided by model developer

@martinkim0 What do you think we should do? What do you think about the initialisation solution?

martinkim0 · 2024-04-12T01:30:08Z

@vitkl hey sorry for the delay, I'm planning on taking a look at this tomorrow!

martinkim0 · 2024-04-12T21:30:34Z

This is actually my first time at taking a look at some of our Pyro code - I hadn't really interacted with it before. So I don't really understand the reason why some things are done, e.g., the warmup callbacks. I definitely need to take a deep dive into all of this.

However, it looks like both PyroJitGuideWarmup and PyroModelGuideWarmup are just passing in a single minibatch through the guide prior to the training loop, so I like the idea of having a method like setup_pyro_model that does this. I think this makes more sense in the training plan though, using one of the Lightning hooks such as on_train_start. And there's definitely something weird going on with tensors on different devices, and I think using one of the Lightning hooks would solve this since their backend will take care of moving tensors.

Regarding the sampling changes, would it be possible to include that in a separate PR? And then we can discuss that there. Thanks!

vitkl · 2024-04-12T22:46:38Z

Just a brief reply. Happy to have a zoom call about pyro.

Pyro automatic variational distribution (Guide) doesn’t have any parameters until you do a first pass through the model and guide. When moving my code to multi-GPU training I found that this needs to be done in setup step of the Lightning workflow - otherwise parameters created on GPU don’t get moved between devices correctly - so it’s it would not in on_train_start. However, in the latest version the setup step also doesn’t work - as reported in the original issue. Moving the code to this function and calling it before using any Lightning workflow steps seems to solve the problem for cell2location and my other project.

Actually the reason for the errors might be resolved if you call both the model and guide with one batch (it’s possibly the issue with LDA model that uses a custom guide).

canergen · 2024-09-05T18:02:40Z

scvi/model/base/_pyromixin.py

-                args, kwargs = pl_module.module._get_fn_args_from_batch(tens)
-                pyro_guide(*args, **kwargs)
-                break
+    for tensors in dataloader:


Better to do next(iter(dataloader)) to get a single batch. I think still having the class makes sense. Within this class, there can be a manual_start function.

canergen · 2024-09-05T18:03:23Z

scvi/model/base/_pyromixin.py

-            break
-
-
-class PyroModelGuideWarmup(Callback):


Why do those two classes exist in the first place?

canergen · 2024-09-05T18:04:24Z

Please split into two PRs. One for the warmup changes and one for the inference changes. This makes it easier to follow changes.

fixes to pyro model init & sampling

a244ff3

vitkl mentioned this pull request Apr 7, 2024

Fix observed variable detection #1805

Closed

add missing tests

7127dc3

vitkl added 3 commits April 8, 2024 02:50

filter tensors by return_sites & exclude_vars

53728cf

detect valid sites to avoid missing to remove deterministic

368e3d5

bug fix

6af8857

martinkim0 self-assigned this Apr 8, 2024

additional valid site filtering

77668a1

call both model and guide to create parameters in both

3bde45e

martinkim0 added the P1 label Jul 12, 2024

canergen reviewed Sep 5, 2024

View reviewed changes

scvi/model/base/_pyromixin.py

break

class PyroModelGuideWarmup(Callback):

Copy link

Member

canergen Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do those two classes exist in the first place?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes to pyro model initialisation & sampling [WIP] #2695

Fixes to pyro model initialisation & sampling [WIP] #2695

vitkl commented Apr 7, 2024

vitkl commented Apr 8, 2024

vitkl commented Apr 8, 2024 •

edited

Loading

martinkim0 commented Apr 12, 2024

martinkim0 commented Apr 12, 2024

vitkl commented Apr 12, 2024

canergen Sep 5, 2024

canergen Sep 5, 2024

canergen commented Sep 5, 2024

Fixes to pyro model initialisation & sampling [WIP] #2695

Are you sure you want to change the base?

Fixes to pyro model initialisation & sampling [WIP] #2695

Conversation

vitkl commented Apr 7, 2024

vitkl commented Apr 8, 2024

vitkl commented Apr 8, 2024 • edited Loading

martinkim0 commented Apr 12, 2024

martinkim0 commented Apr 12, 2024

vitkl commented Apr 12, 2024

canergen Sep 5, 2024

Choose a reason for hiding this comment

canergen Sep 5, 2024

Choose a reason for hiding this comment

canergen commented Sep 5, 2024

vitkl commented Apr 8, 2024 •

edited

Loading