-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling outputs from the same model with different likelihoods / score functions #1403
Comments
I don't understand what you mean here. The PKPD model would be an ODE model, right? So where do the distributions come in?
Just for my intuition, can you give me examples of why you'd want or need to do this? |
just a note to say we will definitely need this for the pkpdapp inference. Different outputs of the ODE model will have different noise models, hence the need for different distributions on different outputs. The different outputs might also be measured at different times, e.g. if you are measuring the concentration of the drug as well as the size of a tumour, these wouldn't neccessarily be at the same time. |
Thanks @MichaelClerx Yes, PKPD models are just ODE models. What I mean is that, for these types of model, when you want to do inference, it seems to often be the case that some of the outputs would be modelling using Gaussian distributions; others with log-normal distributions. Re: different outputs having different measured times, I can give you a good example from epidemiology but there are lots of cases in PKPD modelling it seems due to non-regular measurement of different things (e.g. tumour size and inflammation markers). For SEIR-type modelling, it is often the case that some outputs (e.g. cases and deaths) are measured regularly (e.g. daily) whereas others are measured less regularly (e.g. the proportion of population who are seropositive). |
Thanks both! Worth keeping #1372 in mind too I guess |
Caching sounds useful whether or not we have a split analysis of the model output, so... separate ticket?
|
Yep, spot on with what I was thinking re: 1 and, yes, happy to shy away from "master"! Re: filters, I'm not sure I get ya? Also, currently an issue with MultiSeriesProblem is that this doesn't allow different time arrays for each output, so I guess that'd need to be re-engineered? |
@MichaelClerx I'm going to give my suggestion a go here as need to make progress for PKPD app. But, look forward to your thoughts on the PR! |
Hold on! Can I try explaining the filter thing a bit? |
Sure! |
Basically, in your proposal you start by creating a superproblem that then needs to combine some sub problems, run a sim, and split its output again. So you'd have an ordinary MultiSeriesProblem, but then you'd make sub problems based on its output, and finally combine those sub problems into a single likelihood (although you could hide output as well, using this method) |
All the splitting code would need to do would be e.g.
or
|
The thinking behind this is:
|
Ok, thanks. So would the
How would you handle |
Oooh that's a good point, I was thinking about the simulation much more than the data 😅 Ideally I guess you'd have a But if we start allowing that, then all the likelihoods etc. would need to do some clever filtering? Or... do we write that |
Yup! |
How were you planning to handle data in the super/sub problem set-up? |
So the way I'd handle data would be the same way as for times:
So, we could put |
Is |
No, good spot. For this case, it would just be If you wanted to create subproblems which corresponded to multioutputs, I guess you could do:
then the index list I gave would make more sense. |
The efficiency bit does make this tricky!
but I can't think of a generic way to split a forwardmodel into 2 submodels that recognise they need to use a fixed time set and then filter |
yuck! |
Interesting -- to me -- because the necessity to do this all stems from the inference problem (i.e. having different likelihoods for different outputs) I would be more tempted to solve it at that level than at the forward problem level...? |
I think the root problem isn't so much different outputs, but different time series. We've gone from To me that means we now have a list of Problems, for which we can define a list of likelihoods which we can then combine into a single likelihood using existing code. |
I suppose you were saying the same when you called it a ProblemCollection :D |
OK I'm coming round to your original proposal :D Maybe we can drop index_list though, by assuming the outputs don't overlap, and that the user has control over the forwardmodel, and so can ensure that the outputs are ordered as e.g. [0, 0, 1] instead of [0, 1, 0]? Then it could be:
which you'd implement as e.g.
that kind of thing? |
That could work. But how would you point a particular
I'd have thought keeping an explicit list of indices would mean it were safer than the implicitness above? |
I'd also prefer to call it a |
I just thought it'd be easier to have the problems all be disjoint. You'd get a subproblem from the collection, I think: problem_1 = collection.subproblem(0) |
I like this!
|
PKPD models will likely need outputs from the same model with different likelihoods. E.g. one output follows a normal model; another follows a log-normal. At the moment, this sort of thing is not possible in PINTS without lots of manual hacking. Similarly, it is not possible to natively handle an inference problem with multiple outputs where those outputs are measured at different times.
I wanted to open the forum on this. Here's a potential proposal.
We create a
MasterProblemCollection
with the following methods:SubProblem
(see below). At a simple level, this could be a list[1,1,1,2,2,2,2,3,3,3,3,3]
which would indicate that the first three forward model outputs correspond to subproblem1, the second four to subproblem2 and the last five to subproblem3. It is also initiated with a list oftimes
lists: one list of times for eachSubProblem
.evaluate
takes an argumentindex
(in the above example, this would be either 1, 2 or 3) which specifies which output set to return. The first time evaluate is called, the result is cached so that the model is only solved once -- not once for each output set.n_outputs
also takesindex
and returns the relevant number of outputs for theSubProblem
times
takesindex
and returns the appropriate time listWe also create a
SubOutputProblem
:MasterProblemCollection
and anindex
evaluate
callsMasterProblemCollection.evaluate(index)
n_outputs
etcExample use case
User has a forward model with three outputs and would to model the first two using a
GaussianLogLikelihood
and the third withLogNormalLogLikelihood
.They would do:
The user could then create a wrapper to create a
LogPosterior
class that just returns the sum oflogposterior_1
andlogposterior_2
which they could use for inference.@martinjrobins @MichaelClerx @chonlei (and anyone else!) interested to hear thoughts on this
The text was updated successfully, but these errors were encountered: