Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalyst/PEtab integration #61

Closed
TorkelE opened this issue Aug 28, 2023 · 70 comments
Closed

Catalyst/PEtab integration #61

TorkelE opened this issue Aug 28, 2023 · 70 comments

Comments

@TorkelE
Copy link
Contributor

TorkelE commented Aug 28, 2023

Me and @sebapersson talked today about linking PEtab.jl and Catalyst.jl (https://github.com/SciML/Catalyst.jl) together.

The idea is that someone who has defined a Catalyst.jl reaction system, should be able to define the information required by PEtab.jl to create optimization problems for fitting cost functions. Basically, I would want to create an intermediary struct, PEtabModel (made-up name), so that one can run

petab_model = PEtabModel(...)
createPEtabODEProblem(petab_model, ...)

The hope is that someone who is using Catalyst.jl should be able to circumvent using files to utilise PEtab entirely. I'm really excited about this. Hopefully, we will be able to create really smooth workflows for parameter optimization of CRNs using Julia.

After our discussion, we decided that I should create a proposed structure that holds the information typically stored in the PEtab.jl files. We can then discuss how this one can be improved to better accommodate what PEtab.jl needs. Ultimately, I think having some form of struct PEtabModel will be useful. the equation is whenever we should:

  • Keep it in Catalyst, and then PEtab.jl can figure out how to use it.
  • Put it within PEtab.jl, and if so, whenever to make it parallel to the PEtabl.jl file format, or make readPEtabModel create this structure from the file.
  • Whenever to have high-level version in Catalyst, and then covert it into a low-level version in PEtab.jl
    Right now this is maybe not super important, once we have a fully working workflow changing to a preferred approach should be relatively simple.

Currently, I am thinking of creating something like this:

# Main struct, contains one value for each file in the PEtab format.
struct PEtabModel
    system::ReactionSystem
    experimental_conditions::Vector{PEtabExperimentalCondition}
    observables::Vector{PEtabObservable}  
    meassurments::Vector{PEtabMeassurment}
    parameters::Vector{PEtabParameter}
end

# Currently just a dictionary from parameters to their value.
# Currently indexing depends on vector indexes, however, we might want to make them dictionaries with proper IDs.
# If so, people could still provide conditions in vector form, and for these cases, we use randomly generated IDs (or the vector index as ID).
struct PEtabExperimentalCondition
    parameter_values::Dict{Num, Float64}
end

# Similar as for experimental conditions, we probably would want to base this on IDs instead of vector indexing.
struct PEtabObservable
    obs::Num
end

Alternatively, and maybe better, we have this be a DataFrame with the same information.
struct PEtabMeassurment
    exp_id::Int64    # The index of the corresponding experimental condition in experimental_conditions vector. 
    obs_id::Int64    # The index of the corresponding observable in the observables vector. 
    value::Float64
    time_point::Float64
end

# For all of these, good defaults in PEtab.jl exists. Maybe should make each type Union{Nothing,...}, with Nothing indicating that PEtabl.jl selects its default.
struct PEtabParameter
    parameter::Num
    isconstant::Bool
    value::Union{Nothing,Float64} # if isconstant==false, value=Nothing.
    lb::Float64
    ub::Float64
    prior::Distribution{Univariate, Continuous}
    scale::Symbol # :log10, :linear supported.
end

I think that it should be easy to, within Catalyst, create something like this and send to PEtab. What would be the ideal way for PEtab to receive it in though? If you have a format that you think you could work in, and ensure that that works as input to createPEtabODEProblem, then I can make a link to that, and we should be able to put an example together.

@sebapersson
Copy link
Owner

This looks like a good first proposal, and I think this integration would be great for allowing smooth parameter estimation in Julia. First. some general comments first (see comments on the struct):

# I think this a good structure (might be worth to call it CatalystPEtabModel at technically it is not a full 
# PEtab model as it does not follow the full PEtab format). Would also allow me to then apply an extra 
# parsing step before creating ODE-problem, so I could have 
# createPEtabODEProblem(model::CatalystPEtabModel, ...)
struct CatalystPEtabModel
    system::ReactionSystem
    experimental_conditions::Vector{PEtabExperimentalCondition}
    observables::Vector{PEtabObservable}  
    meassurments::Vector{PEtabMeassurment}
    parameters::Vector{PEtabParameter}
end

#=
Looks good - but each experimental condition should have a string-id (or similar). Similar for observable 
id 
=#
struct PEtabExperimentalCondition
    parameter_values::Dict{Num, Float64}
end

#=
Each observable should be given by a string ID 
=#
struct PEtabObservable
    obs::Num
end

#=
I think a DataFrame is the best option here (but we can keep this struct to allow anyone to simulate data). 
See my comments on the fields. There are also observable parameters and PreEq ID:s (if we have a steady
state simulation), but I think we can add those later (should be straightforward).
=#
struct PEtabMeassurment
    exp_id::String # Should be a text id 
    obs_id::String    # Should be a text id 
    value::Float64
    time_point::Float64
    #=
     The sigma-parameter (measurement noise) can either be a value or a parameter in the parameters-struct.
    Has to be here, as different measurement-series (even of the same observable) can have different 
   measurement noise.
    =#
   noise_parameter::Union{Float64, Num} 
end

#=
# For all of these, good defaults in PEtab.jl exists. Maybe should make each type Union{Nothing,...}, 
with Nothing indicating that PEtabl.jl selects its default. Nothing is a good thing here for defaults :), 
note that prior should be allowed to be nothing (if no prior is given).
This looks good, but by PEtab standard replace isconstant with estimate 
=#
struct PEtabParameter
    parameter::Num
    isconstant::Bool
    value::Union{Nothing,Float64} # if isconstant==false, value=Nothing.
    lb::Float64
    ub::Float64
    prior::Distribution{Univariate, Continuous}
    scale::Symbol # :log10, :linear and :log supported.
end

Overall I think the following would work. If catalyst provides a CatalystPEtabModel then PEtab can use the information to basically create the PEtab-files (they do not have to explicitly created). Given the PEtab-files a PEtabODEProblem can easily be created. So I imagine a workflow like

petab_model = CatalystPEtabModel(...)
# I expand createPEtabODEProblem to handle CatalystPEtabModel in PEtab.jl 
petab_problem = createPEtabODEProblem(petab_model::CatalystPEtabModel, ...)

So if you could provide a CatalystPEtabModel on the formt above for a simple model, say;
k1 : X -> Y
k2 : Y -> X
Where we aim to estimate k1, k2, assuming normally distributed measurements of Y with sigma ~ N(0, 1) I can expand createPEtabODEProblem to handle CatalystPEtabModel.

Later in order to create a robust test-suite we can recreate the PEtab test-suite using reaction-system.

Overall, what do you think?

@TorkelE
Copy link
Contributor Author

TorkelE commented Aug 29, 2023

This looks good, here is the altered CatalystPEtabModel format corresponding to your suggestions right now.

struct CatalystPEtabModel
    system::ReactionSystem
    experimental_conditions::Dict{String,PEtabExperimentalCondition}
    observables::Dict{String,PEtabObservable}  
    meassurments::DataFrame
    parameters::Vector{PEtabParameter}
end

struct PEtabExperimentalCondition
    parameter_values::Dict{Num, Float64}
    id::String
end

struct PEtabObservable
    obs::Num
    id::String
end

struct PEtabParameter
    parameter::Num
    estimate::Bool
    value::Union{Nothing,Float64} 
    lb::Union{Nothing,Float64}
    ub::Union{Nothing,Float64}
    prior::Union{Nothing,Distribution{Univariate, Continuous}}
    scale::Union{Nothing,Symbol} # :log10, :linear and :log supported.
end

I will create an example as well that we could use to try and fit data. I agree, let's do steady state after we have a working methodology without.

My only question is regarding having meassurments as a DataFrame. When we have it as an explicit struct, don't we guarantee that it has a certain number of fields and that these are filled, while a DataFrame could contain additional fields (or miss some)? Is there a way to define a DataFrame type where it has a certain number of fields? If not I guess we could add a quick check to the data frame ensuring it contain the right informtation.

@TorkelE
Copy link
Contributor Author

TorkelE commented Aug 29, 2023

Also, exactly how do you want the u0 defined? For a species X, should I simply create a parameter X which denotes its initial condition?

@sebapersson
Copy link
Owner

I don't think we can restrict a DataFrame, on the other side, writing a function to check the DataFrames is straightforward.

Good question, preferably as a state_map (which is already how we do it), i.e on the format:

state_map = [x => 207.6*k1, y => 0.0]

That is, for each model state either a numerical value, or an equation (depending on model parameters, which can be an initial value) are okay to supply.

@TorkelE
Copy link
Contributor Author

TorkelE commented Aug 29, 2023

Sounds good.

Should I make the initial conditions part of the PEtabExperimentalCondition struct, e.g. expand it to:

struct PEtabExperimentalCondition
    parameter_values::Dict{Num, Float64}
    u0::Vector{Pair{Num.Union{Float64,Num}}}
end

? Here I put the first value to a Num, but could be a Symbol or something else as well, was a bit unsure.

@sebapersson
Copy link
Owner

I think actually u0 should be something entirely separate (which is then provided into CatalystPEtabModel). I am thinking about doing as is done in ModellingToolkit where when building on ODEProblem the user can if they want to provide a state-map (as here https://docs.sciml.ai/ModelingToolkit/stable/tutorials/ode_modeling/).

And for parameter values the first value being Num looks good.

@TorkelE
Copy link
Contributor Author

TorkelE commented Aug 30, 2023

I am not entirely sure what you mean, You mean something like this where u0 is a separate field?

struct CatalystPEtabModel
    system::ReactionSystem
    experimental_conditions::Vector{PEtabExperimentalCondition}
    observables::Vector{PEtabObservable}  
    meassurments::Vector{PEtabMeassurment}
    parameters::Vector{PEtabParameter}
    u0::Vector{Pair{Num.Union{Float64,Num}}}
end

Then if one wants u0 to varry between various experimental conditions one set that specific u0 value to be a parameter, the value which depends on experimental_conditions["ExpX"].parameter_values?

Also, a second question. If you have e.g. parameter p depend on experimental_conditions["ExpX"].parameter_values[p], what should p be in the parameters field? I presume that it must have estimate=true. However, can it still have a value (which is used as the default when experimental_conditions["ExpX"].parameter_values does not contain the parameter)?

@sebapersson
Copy link
Owner

Yes, this is how I see the struct. And yes, if we want some u0 to change between conditions it is better if we make that specific u0 value to change a parameter. I think this is a good solution as, at least in my experience, typically only a few u0 values are changed between experimental conditions, so instead of having to provide an u0 per experimental condition, it is probably better to specify the u0:s that change by a parameter value.

Now a complication here, which I think you touch on with the second question, is that if an initial value is a a parameter how do we specify that? If it is a parameter that is constant or we want to estimate is should be specified in PEtabParameter. However, if it is a control-parameter (dictating an experimental condition) it should not be specified in PEtabParameter. In the PEtab standard control parameters are not a part of PEtabParameter (as they are control parameters and not yet allowed to be estimated), rather the user must provide a value for the control parameter for each experimental condition. So I am thinking that if a user provide a new parameter as initial value control parameter, then we should enforce the user provides a value on that parameter for each experimental condition. What do you think?

@TorkelE
Copy link
Contributor Author

TorkelE commented Aug 30, 2023

(I work in a category of systems where the only difference between experimental conditions is that one changes all the non-trivial (non-0) u0 values, however, for the greater good I don't mind going with your solution, seems like it should work out)

So, in essence, we have 3 types of parameter values?

  1. Parameters that we would like to estimate (that might have a prior distribution and bounds).
  2. Parameters that have a known value.
  3. Parameters that have known values, but with these varying between each experimental condition.
    Here, the first two would be required to be specified in the PEtab parameters field/file, while the third would be given in the experimental_conditions field/file (and if it is not provided for each experimental condition, and error would be thrown). It seems like a weird dissonance that (2) is provided in parameters but (3) is not (however, I can also see why, as they have to be provided elsewhere).

Still, I think we have a fully working format here, so maybe we go with this to make something that works, and then we can discuss whenever it makes sense to throw things around a bit later on?

@sebapersson
Copy link
Owner

I see what you mean with 2 and 3, and typically in the PEtab standard 2 does not have to specified explicitly (as often the constant values for these are already set in the SBML file), so if it could be possible to set constant parameter when specifying the reaction system then given that, it would not be needed to specify all constant parameters in Parameters?

And I agree, we can go with this, get something that works, and then make potential changes-

@sebapersson
Copy link
Owner

I see what you mean with 2 and 3, and typically in the PEtab standard 2 does not have to specified explicitly (as often the constant values for these are already set in the SBML file), so if it could be possible to set constant parameter when specifying the reaction system then given that, it would not be needed to specify all constant parameters in Parameters?

And I agree, we can go with this, get something that works, and then make potential changes.

@TorkelE
Copy link
Contributor Author

TorkelE commented Aug 30, 2023

Technically it would be possible to parse the provided ReactionSystem, and substitute in parameter values before the system is sent over to PEtab. However, ideally, I would like to keep these separate, as this is how they are handled across the entire SciML system. I also think there's a decent number of cases where this is advantageous (e.g. in many cases system input is modelled as a parameter, and a callback is used to change this value to signal activation. This is no longer possible if a parameter is substituted in).

I will keep things as suggested now, and then we can add this to the list of things to consider once we have a methodology sorted out.

@TorkelE
Copy link
Contributor Author

TorkelE commented Aug 30, 2023

Does this work?

### Preparations ###

# Fetch required packages.
using Catalyst, OrdinaryDiffEq, Distributions, DataFrames, Plots

# Prepares the model simulation conditions.
rn = @reaction_network begin
    (p,d), 0 <--> X
    (1,k), X <--> Y
end
u0 = [1.0, 0.0]
tspan = (0.0, 10.0)
p = [1.0, 0.2, 0.5]


### Declares CatalystPEtabModel Structure ###

# The conditiosn for an experiment.
struct PEtabExperimentalCondition
    parameter_values::Dict{Num, Float64}
    id::String
end
# An observable value.
struct PEtabObservable
    obs::Num
    id::String
end
# A parameter.
struct PEtabParameter
    parameter::Num
    estimate::Bool
    value::Union{Nothing,Float64} 
    lb::Union{Nothing,Float64}
    ub::Union{Nothing,Float64}
    prior::Union{Nothing,Distribution{Univariate, Continuous}}
    scale::Union{Nothing,Symbol} # :log10, :linear and :log supported.
end
# A full PEtab model.
struct CatalystPEtabModel
    system::ReactionSystem
    experimental_conditions::Dict{String,PEtabExperimentalCondition}
    observables::Dict{String,PEtabObservable}  
    meassurments::DataFrame
    parameters::Vector{PEtabParameter}
    u0::Vector{Pair{Num,Union{Float64,Num}}}
end


### Performs Experiment ###

# Meassurment settings.
meassurment_times = 2.0:2.0:10.0
meassurment_error_dist = Normal(0.0, 0.4)

# Make simulation.
oprob = ODEProblem(rn, u0, tspan, p)
sol = solve(oprob, Tsit5(); tstops=meassurment_times)

# Get meassurment values.
meassured_values_true = [sol[2, findfirst(sol.t .== mt)] for mt in meassurment_times]
meassured_values = meassured_values_true .+ rand(meassurment_error_dist, length(meassured_values_true))

# Plot the experiment an meassurments.
plot(sol; idxs=2)
plot!(meassurment_times, meassured_values; seriestype=:scatter)


### Prepares the PEtab Structure ###
@unpack X, Y, p, d, k = rn

# The system field.
system = rn

# The experimental conditions field.
exp1 = PEtabExperimentalCondition(Dict([X=>1.0, Y=>0.0, d=>0.2, k=>2.0]), "Obs1")
experimental_conditions = Dict(["Exp1" => exp1])

# The observables field.
obs1 = PEtabObservable(Y, "Obs1")
observables = Dict(["Obs1" => obs1])

# The meassurments field
nM = length(meassured_values)
meassurments = DataFrame(exp_id=fill("Exp1", nM), obs_id=fill("Obs1", nM), value=meassured_values, time_point=meassurment_times, noise_parameter=fill(0.4,nM))

# The parameters field.
par_p = PEtabParameter(p, false, nothing, 1e-2, 1e2, nothing, :log10)
par_d = PEtabParameter(d, true, 0.2, nothing, nothing, nothing, nothing)
par_p = PEtabParameter(k, true, 2.0, nothing, nothing, nothing, nothing)
parameters =[par_X, par_Y, par_p, par_d, par_p]

# The initial conditions field.
u0 = [X => 1.0, Y => 0.0]

# Creates the model
petab_model = CatalystPEtabModel(system, experimental_conditions, observables, meassurments, parameters, u0)


### Saves Model to File ###
using Serialization
serialize("petab_model.jls", petab_model)

Currently, there is only one parameter to estimate, however, if that works we should be able to make an additional parameter unknown, or something that is carried in experiments. I'm also sending the petab_model.jls file (although the CatalystPEtab struct must still be declared to load it)

@sebapersson
Copy link
Owner

Thanks, this looks like it would work. Will try to take a look and come back this or early next week :)

@sebapersson
Copy link
Owner

This turned out to be easier than expected.

(Note I made some small alterations to the Structs)

include(joinpath(@__DIR__, "Catalyst_functions.jl"))

# Prepares the model simulation conditions.
rn = @reaction_network begin
    (p,d), 0 <--> X
    (1,k), X <--> Y
end


### Performs Experiment (simulate data) ###
u0 = [1.0, 0.0]
tspan = (0.0, 10.0)
p = [1.0, 0.2, 0.5]
meassurment_times = 2.0:2.0:10.0
meassurment_error_dist = Normal(0.0, 0.4)
oprob = ODEProblem(rn, u0, tspan, p)
sol = solve(oprob, Tsit5(); tstops=meassurment_times)
meassured_values_true = [sol[2, findfirst(sol.t .== mt)] for mt in meassurment_times]
meassured_values = meassured_values_true .+ rand(meassurment_error_dist, length(meassured_values_true))


### Prepares the PEtab Structure ###
@unpack X, Y, p, d, k = rn

# The system field.
system = rn

# The experimental conditions field.
exp1 = PEtabExperimentalCondition(Dict([d=>0.2, k=>2.0]))
experimental_conditions = Dict(["Exp1" => exp1])

# The observables field.
obs1 = PEtabObservable(Y, :lin, 0.4)
observables = Dict(["Obs1" => obs1])

# The meassurments field
nM = length(meassured_values)
meassurments = DataFrame(exp_id=fill("Exp1", nM), obs_id=fill("Obs1", nM), value=meassured_values, time_point=meassurment_times, noise_parameter=fill(0.4,nM))

# The parameters field (we are going to create a nice constructor here)
par_p = PEtabParameter(p, true, nothing, 1e-2, 1e2, nothing, :log10)
par_d = PEtabParameter(d, false, 0.2, nothing, nothing, nothing, nothing)
par_k = PEtabParameter(k, false, 2.0, nothing, nothing, nothing, nothing)
petab_parameters = [par_p, par_d, par_k]

# The initial conditions field.
state_map = [X => 1.0, Y => 0.0]

# Creates the model
petab_model = readPEtabModel(system, experimental_conditions, observables, meassurments, 
                             petab_parameters, state_map, verbose=true)

# Can now easily be made into PEtabODEProblem
petabProblem = createPEtabODEProblem(petab_model)
p = [log10(20)]
f = petabProblem.computeCost(p)
∇f = petabProblem.computeGradient(p)
Δf = petabProblem.computeHessian(p)

I added functions to parse the PEtab-input structs into the PEtab-file format (as DataFrames). Given that, I could build a PEtabODEProblem from the reaction system. Overall, with this approach it should also be easy to add additional features such as stady-state conditions, noise parameters etc. (basically recreate the PEtab test-suite).

Going forward, in order to create a smooth user experience I think we need to figure out how to specify initial values, and how to set default parameter values (parameters which are constant and are not control-parameters). An example, currently if I want to estimate initial-value I have to do the following when defining the reaction system:

rn = @reaction_network begin
    @parameters a0 b0 # Initial values to estimate must be parameters 
    (k1), A --> B
    (k2), B --> A
end

@unpack A, B, k1, k2, a0, b0 = rn
state_map = [A => a0, B => b0]

While I imagine something like this would be smoother

rn = @reaction_network begin
    @parameters a0 b0 # Initial values to estimate must be parameters 
    @species A(t)=a0 B(t)=b0 # Only specify for species having non-zero initial value 
    (k1), A --> B
    (k2), B --> A
end

Likewise, instead of setting constant non-control parameter values via PEtabParameter I think it would be nice to set these directly when defining the reaction system. This is how it is done typically for PEtab-model (default parameter values or the initial-value equation is set in the SBML-file). For this to work, however, I need to be able to extract the default values from the reaction system (to properly setup the functions needed by PEtab.jl), is it something that can be done? Further, any more input on this?

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 4, 2023

Looks good. Is there somewhere where I can check the updated structs?

It is possible to set default values of parameters and initial conditions already: https://docs.sciml.ai/Catalyst/stable/catalyst_functionality/dsl_description/#dsl_description_defaults
Not sure if it is possible to make initial conditions parameters though. Whether or not that is easy to do depends on ModelingToolkit, I could check this.

I'm mixed with using default parameter values within the ReactionSystem as the way to define values that are neither estimated nor varies between experimental settings. From the workflow where you create a ReactionSystem, create a PEtabModel, and then calibrate the parameters it makes sense. But if you want to e.g.

  • Do some simulations with different parameter values, then calibrate.
  • Do calibration for several different values.
    Then you would have to create new reaction systems or change the default values. That's not really a big problem though, but I am thinking if there is a more ideal way.

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 4, 2023

It should also be noted that some parameter properties can be defined for parameters via metadata: https://docs.sciml.ai/ModelingToolkit/stable/basics/Variable_metadata/#Bounds
https://docs.sciml.ai/ModelingToolkit/stable/basics/Variable_metadata/#Mark-parameter-as-tunable
Metadata can be added to parameters in Catalyst already (and we can add more fields if we want to) e.g.

@reaction_network begin
    @parameters p1 [mymetadata=true] p1 [someothermetadata=2.0] 
end

For now we might want to ignore these fields and make it work without, and then we can add functionality that infers these properties if they exist.

@sebapersson
Copy link
Owner

Yes, the updated structs can be found here: https://github.com/sebapersson/PEtab.jl/blob/Catalyst_integration/test/Catalyst_functions.jl. (note I removed PEtabCatalystModel as I figured out how to process these structs directly into a PEtabModel)

Regarding your points on parameter values I see what you mean. So we have that setting default values in non-ideal for workflows, while setting constant parameters via PEtabParameter is to clunky. Would it then be possible to set values for constant parameters via a parameter-map - like lets say we would want to set a constant value for k1 (and set k2 later via PEtabParameter)

rn = @reaction_network begin
    @parameters a0 b0 # Initial values to estimate must be parameters 
    (k1), A --> B
    (k2), B --> A
end

@unpack A, B, k1, k2, a0, b0 = rn
state_map = [A => a0, B => b0]
parameter_map [k1 => 3.0]

Then I could accept parameter_map when building the PEtab model and take this constant into consideration when building the ODEProblem.

Regarding parameter dependent initial values, do you think it is better to use a state-map, or specify the relationship in the reaction system?

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 4, 2023

Setting it via a parameter map like that seems good. It is also similar to how it is done with e.g. ODEProblems, so should be quite intuitive for most users.

For the states, I am currently leaning towards defining this in the reaction system. Having initial conditions being defined by parameters is something I can see being useful under other circumstances as well, so seems good to make it a general thing. I'm checking how this should work within the ModelingToolkit framework right now though. If it turns out to be a mess we might have to go with defining it in the state map and it being a feature within PETab only.

@sebapersson
Copy link
Owner

Sounds good, I will then add a parameter-map.

I will also code up the PEtab test-cases using Catalyst (too see if there is any useful feature we do not currently cover).

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 4, 2023

Setting intial conditions as parameters should work well:

using Catalyst
rn = @reaction_network begin
    @parameters x0
    @species X(t)=x0
    (p,d), 0 <--> X
end
oprob = ODEProblem(rn, [], (0.0,10.), [:p=>1.0, :d => 3.0, :x0=>100.0])
sol = solve(oprob)

The value can be retrieved via e.g.

@unpack X = rn
Catalyst.get_defaults(rn)[X]

or found in

X.val.metadata[Symbolics.VariableDefaultValue]

@sebapersson
Copy link
Owner

Then I say we go with setting initial-values directly in the reaction system, I also managed to this approach working with PEtab.jl and I think it looks nice (below is the first PEtab test-case, note I added constructor for PEtabParameter):

#=
    Recrating PEtab test-suite for Catalyst integration to robustly test that
    we support a wide arrange of features for PEtab integration
=#

using Test
include(joinpath(@__DIR__, "..", "Catalyst_functions.jl"))

# Define reaction network model 
rn = @reaction_network begin
    @parameters a0 b0
    @species A(t)=a0 B(t)=b0
    (k1, k2), A <--> B
end

# Measurement data 
measurements = DataFrame(exp_id=["c0", "c0"],
                         obs_id=["obs_a", "obs_a"],
                         time_point=[0, 10.0],
                         value=[0.7, 0.1],
                         noise_parameter=0.5)

# Single experimental condition                          
experimental_conditions = Dict(["c0" => PEtabExperimentalCondition(Dict())])

# PEtab-parameter to "estimate"
petab_parameters = [PEtabParameter(:a0, value=1.0, scale=:lin),
                    PEtabParameter(:b0, value=0.0, scale=:lin),
                    PEtabParameter(:k1, value=0.8, scale=:lin),
                    PEtabParameter(:k2, value=0.6, scale=:lin)]

# Observable equation                     
@unpack A = rn
observables = Dict(["obs_a" => PEtabObservable(A, :lin, 0.5)])

# Create a PEtabODEProblem 
petab_model = readPEtabModel(rn, experimental_conditions, observables, measurements,
                            petab_parameters, verbose=true)
petab_problem = createPEtabODEProblem(petab_model)

# Compute negative log-likelihood 
nll = petab_problem.computeCost(petab_problem.θ_nominalT)
@test nll  0.84750169713188 atol=1e-3

Overall, integrating PEtab with Catalyst seem to work nicely. Next I could expand the importer and structs to cover additional features (e.g. steady-state simulations) by recreating the PEtab test-suite, and after that I propose we see if we should change some of the structs? (I am currently not entirely happy with how we specify measurement data, feels a bit clunky).

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 4, 2023

Looks good!

Quick questions:

  • Here, are all 4 of a0, b0, k1, k2 parameters that we wish to estimate? If so, why are they given values (1.0, 0.0, 0.8, 0.6)?
  • For both measurements, you have noise_parameter=0.5. You also have a single observable with (A, :lin, 0.5). Here, what does the two 0.5 values suggest? Does the lin here suggest that one should use linear scale for the difference between measured and "assumed true` value.
  • If one wants to select distribution for this error, how does one do that?

@sebapersson
Copy link
Owner

Good questions.

  1. I just set the values for the test (Currently the user can choose whether or not to provide values for parameters that are to be estimated).
  2. This is a mistake from my part (when I set things up). In (A, :lin, 0.5) 0.5 is the noise-formula, which as we here have to be 0.5 (as we know the measurement-noise in this case). So in this case I do not need to specify noise (per PEtab standard) in measurements. In case I would have a noise-formula like (A, :lin, A * noiseParameter1) then in measurements I would have to provide noise_parameter=[a, b] where a, b can be constant values that replace noiseParameter1, or even parameters specified by PEtabParameter that replace noiseParameter1. All-to-all, I think a good solution is to have the user provide the noise-formula in the observable, and in case it is a non-constant we can force the user to also provide a noise-parameter? And exactly, lin suggests a linear scale for the difference (allowed are lin, log, log10), if we instead would have used log we would have assumed log-normal measurement error.
  3. I have not added support for this. Currently you can either have a normal-distribution (using lin), or a log-normal (using log or log10 for the difference between measured or non-measured). If you think more distributions would be relevant we can add it?

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 4, 2023

Sounds good. I am still a bit uncertain about (1) through. If a0, b0, k1, k2 are estimated in this case (they are, right?), does setting this values have any direct impact on what is going on?

With regard to (2). Yes, it seems that when we are done we might want to have a series of checks to ensure nothing is missing. Your suggested approach here seems good (provide the noise formula in the observables, and any quantity here that carries between measurements is given as noise parameters there.

Right now, the number (noiseFormula) in observable is a formula for the std in the normal distribution, right?

(3) To be honest, linear and log-normally distributed noise will probably cover almost all cases, and I cannot directly say there is another type of noise I'd want right now. I think it is more a question if we want to future-proof us so that the approach can work with other noise distributions when/if they get added. Would it make sense to add a field to PEtabObservable for this, and for now we just assume that is always going to be a normal distribution?

@sebapersson
Copy link
Owner

  1. Setting their values have no effect at all (I do not compile with respect to the set values). Rather when I extract the nominal-parameter vector petab_problem.θ_nominalT in θ_nominalT the set values can be found. So it is entirely for the convenience of testing the code against cases where we know the likelihood value.
  2. Good, then I will keep the noise formula in observable (and make it a field the user must provide).
  3. I agree with you, and I am hesitant to add more distributions unless they are truly needed. This is because in order to compute the gradient correctly for adjoint-sensitivities and Hessian approximation for the Gauss-Newton approximation at some places I rely on analytically derived formulas for the normal distribution - adding a new distributions means to hardcode some partial derivatives. Thus, I would rather opt for writing in the documentation that if the user wants to use a different distribution they should file an issue, and then I can add it.

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 4, 2023

Excellent, yes, putting such a note about other distributions sounds like the way to go. Didn't even realise that there was customized code for normal distributions.

If we do not want to let initial conditions be parameters that are estimated, we can provide their values just using a map when we run readPEtabModel? And for the parameter values that remain fixed throughout the simulation.

@sebapersson
Copy link
Owner

Yes, providing the the constant initial values like a map sounds like a good plan. And in the case the user does not provide anything for initial value I say we assume it is zero.

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 4, 2023

Assuming 0 if missing sounds good. I think the pieces are falling into place? I will start drafting documentation for Catalyst. Once there is a version of the interface working in PEtab.jl I might be able to help with the coding there to cover corner cases and similar stuff.

@sebapersson
Copy link
Owner

I agree the pieces are falling into place. I will work through the PEtab test-cases (to make sure there is nothing we miss in term of support), and add things like steady-state simulations. Have some other things this week, but should be done with this latest by the end of the week.

@sebapersson
Copy link
Owner

I see. Most optimization packages take a Vector though, e.g. Optmisation.jl as here https://docs.sciml.ai/Optimization/stable/examples/rosenbrock/ which I think is the most scalable solution if there are many (>30) parameters to estimate. But for smaller problems I do like the syntax with the parameter map better. So I think the best solution is to allow both - that if a vector is provided then in calibrateModel function the parameter map is parsed into a Vector{Float64}.

Would be grateful if you could help with the implementation

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 18, 2023

I agree. if you make the version that acts on Vector{Float64} I will make a PR with overloads in case maps are given instead.

@sebapersson
Copy link
Owner

I have now merged the Catalyst integration with main, and following #69 I have also changed the naming to better follow the Julia code-standard, as can be seen in the updated docs https://sebapersson.github.io/PEtab.jl/dev/Define_in_julia/.

Next release will be breaking (as naming changes), so before that if you (@TorkelE ) have any feedback on naming etc please tell.

The only thing left now is PEtabDose which I hopefully will have time to add next week.

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 24, 2023

Still traveling, but will be back tomorrow (Sunday) at which point I will make some additional tutorials, should hopefully be done quite quickly.

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 24, 2023

I'm going through things, and only really have two thoughts:

Single simulation experiment

There are a decent number of cases when there is only really is a single experiment (e.g. modelling epidemics, where each epidemic is distinct and only happens once). Would it make sense to create a dispatch for PEtabModel when no (or empty) simulation_conditions is provided? Under the hood, PEtabModel could then create an empty condition dictionary, give it a default tag ("__condition_0") and attach it to all measurements (and do a check so that there are no parameters unaccounted for).

Observables

I'd say in the vast majority of cases, the observable is a species of the system. In these cases, it would be pleasant to be able to designate this without having to explicitly @unpack that species. E.g. for

rn = @reaction_network begin
    (p, d), 0 <--> X
end

where X is observed, something like

obs_X = PEtabObservable(:X, 0.1)

would be useful. When PEtabModel is run, one would infer that :X is in fact the variable X(t). Even for systems with more complicated observables, e.g.

rn = @reaction_network begin
    (k1,k2), X1 <--> X2
end

where we now would run

@unpack X1, X2 = rn
obs_Xtot = PEtabObservable(X1+X2, 0.1)

my goal is that observables should instead be incorporated into the reaction system:

rn = @reaction_network begin
    @observed Xtot = X1+X2    # Not actually implemented yet, the .observed field exist though, but cannot be filled this way).
    (k1,k2), X1 <--> X2
end

and one could run

obs_Xtot = PEtabObservable(:Xtot, 0.1)

However, I the current approach is more general. For a starter, it does give the flexibility of being able to define stuff like Xtot=X1+X2 within PEtabObservable, but also (as in your example) to have more explicit formula for measurement errors and noise.

I'm not sure what is best. I can imagine that it is a bit messy to support both cases. If there is an elegant solution I think it would be worth it, however, depends on what you think is best.

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 24, 2023

It can also be noted that Catalyst exports ModelingToolkit, so using ModelingToolkit is not strictly required in https://sebapersson.github.io/PEtab.jl/dev/Define_in_julia/ (although one might still want to have it there, just pointing out in case)

@sebapersson
Copy link
Owner

Thanks for the input!

Simulation condition

I agree, and I will write a method where I create a __condition_0 in case nothing is provided.

Observable

Hmm, this one is a bit tricky.

As a starting point, I think we need to keep the current system. Firstly, because of the noise-formula. Secondly, because it makes it possible to for example define noise and observable parameters as here while allowing me to clearly keeps track on which parameters are a part of the dynamic model. This is important, as computing the gradient for parameter not in the dynamic system is substantially faster (so tracking these parameter is important).

I also like the approach with observable in the ReactionSystem - so I am thinking maybe something like this. In case the observable is not advanced, (e.g. no observable parameter), we recommend the user down the line to define it in the reaction system (and this is what we include in the starting tutorial). Then, in case a more advanced observable is used we keep the current system and refer users to more advanced documentation? Further, I would not like to allow something like

obs_X = PEtabObservable(:X, 0.1)

as I am afraid it gets a bit messy if there are too many ways to define the observable, I think either via reaction-system and in advanced cases as currently should be sufficient?

Lastly

I loaded ModellingToolkit just to highlight it has to be loaded if the user wants to give the model as ODESystem (and not Catalyst) :)

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 25, 2023

All sounds excellent!

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 26, 2023

I had a thought (and I am currently throwing out most thoughts I have, so you can do what you want with them). Currently, using this interface, parameters can be given in 3 different ways:

  1. As a parameter that we want to find.
  2. As a parameter that carries between simulations.
  3. As a parameter with a known value.

Currently, if a parameter is not given in any of the three inputs, (1) is the default. Would it make sense to force the user to provide each parameter in each input? It is basically a trade-off, it is nice not to have to define all the parameters that one wants to fit. However, sometimes the user might forget some parameter somewhere by mistake, and then throwing an error might be good. This problem is probably more pronounced in the new interface where option (3) was added.

An intermediate solution is to kwarg (false by default) that when set to true enable the current behaviour. However, if left unchanged, an error is thrown if a parameter is forgotten.

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 27, 2023

Also, another thing. I've talked to people, and apparently when representing ModelingToolkit type equation expressions (e.g. A +B*X) in the field of a struct, even though Num is a valid type, the preferred field type should be Any (I didn't really understand why, but this seems to be a quite general thing). So for example obs in PEtabObservable should be ::Any rather than ::Num (and the same with noise_formula).

Sorry for not catching this earlier! I am not sure how much it is going to matter, but it might be worth considering making this change befor the breaking release. I tried to check where Num is used, and there is:

struct PEtabObservable
    obs::Num
    transformation::Union{Nothing, Symbol}
    noise_formula::Union{Nothing, Num}
end
function PEtabObservable(obs::Num, 
                         noise_formula::Union{Nothing, Num, T};
                         transformation::Symbol=:lin)::PEtabObservable where T<:Real
    return PEtabObservable(obs, transformation, noise_formula)
end

Where I think both obs and noise_formula could just become Any, with corresponding changes to the constructor. Similar with the parameter field in PEtabParameter and its constructor.

The PEtabExperimentalCondition struct in "test/Catalyst_functions.jl" and the "where ..." part of the two function PEtab.PEtabModel(system::ReactionSystem, ... in "ext/CatalystExtension/Create_PEtab_model.jl" also mentioned this type, but I think that's it.

@sebapersson
Copy link
Owner

Thanks for the input! I will change to Any!

I see what you mean with the parameter being forgotten which easily can happen. How about throwing a warning if a parameter is forgotten ? Or do you think users might ignore the warning and only act in case an error is thrown?

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 27, 2023

From a user's point of view, I think a warning should be good.

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 28, 2023

The Catalyst/PEtab tutorial at Catalyst docs is up now: SciML/Catalyst.jl#695

It's basically a focused version of the PEtab one, just focusing on options for fitting a Catalyst ODE to data. Will probably be a bit until it gets merged.

@sebapersson
Copy link
Owner

This is great!

I will take a look at it later. Also, regarding PEtabDose if nothing unforeseen happens I will probably have time to start working on it tomorrow.

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 28, 2023

I'm having a 3 hour workshop on CRN modelling with Catalyst at https://icsb2023.bioscience-ct.net/ on October 8th as well. Ideally, I would like yo include a short section on parameter fitting with PEtab.jl, just to show a very basic workflow.

@sebapersson
Copy link
Owner

Congrats to getting a workshop at ICSB!

And if you want to include a part on PEtab.jl I would be happy to help with creating material - and since most things are in place now to do parameter estimation with Catalyst before 8:th October I can make next PEtab.jl release before then.

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 28, 2023

Great! Yes, I'm making a push for quite a couple of things in Catalyst now as well, aiming to have as much as possible in place for the workshop.

I plan to sketch out the workshop this weekend, for the PEtab stuff I will probably make an example I want to use, and then send it to you (if that's ok) for checking/modifications.

@sebapersson
Copy link
Owner

Sounds good! And no problem, when you have an example send it and I will be happy to provide feedback.

@sebapersson
Copy link
Owner

Happy to inform there is now a PEtabEvent struct, on the format;

struct PEtabEvent
    condition
    affect
    target
end

Basically condition species the trigger, affect the affect of the trigger, and target what is affected. For example:

PEtabEvent(3.0, S + 3, S)

means that at time t == 3 a value of 3 is added to the state S. Moreover,

PEtabEvent(S > 3.0, 3,  :c1)

Means that when state S crosses (from below 3) the limit 3 parameter c1 takes value 3. To trigger when S equals 3 simply use S == 1.

The target can be states or model parameters, and affect can also be a parameter (e.g. a control parameter). Under the hood PEtabEvent is rewritten to either a DiscreteCallback (if it triggers at a specific time) or a ContiniousCallback (if the trigger is dependent on model states).

If you want any additional feature just tell, I still have to add the docs for this (but it is tested). The thing lacking is a dosage time, this is more complicated to pull off so it will have to wait a bit.

@TorkelE
Copy link
Contributor Author

TorkelE commented Sep 29, 2023

Sounds excellent, should cover all cases I'm familiar with

@sebapersson
Copy link
Owner

You can now also find event documentation (https://sebapersson.github.io/PEtab.jl/dev/Julia_event/), will add plots in the future, but now it currently holds all the information needed to make events.

@TorkelE
Copy link
Contributor Author

TorkelE commented Oct 3, 2023

Also, did you see my comment on changing e.g. IpoptExtension to PEtabIpoptExtension.jl?
(the point being if another package have an IpoptExtension, there will be problem if both are loaded at once)

It should be fine not to append PEtab to all extension names, but just wanted to double-check in case you missed it (since if you want to add it, it probably should be done before 2.0).

(if you did see the comment, just ignore this!)

@sebapersson
Copy link
Owner

Thank you for reminding me about this, totally slipped my mind. Will address it

@sebapersson
Copy link
Owner

I would say this issue has been addressed as PEtab.jl now integrates with Catalyst, and there is event support.

There are some things left like;

  1. Add duration for events
  2. Uncertainty analysis of model parameters/predictions
  3. Likely some more util functions
  4. Documentation is a never ending story

But these should likely be separate issues. If you do not have anything more that should be added @TorkelE I will close this and #77 , and then release next version

@TorkelE
Copy link
Contributor Author

TorkelE commented Oct 4, 2023

Yes, I consider this done with extra on top.

@TorkelE TorkelE closed this as completed Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants