From 298ed8b35eb3648457713469cb479a0b53c83671 Mon Sep 17 00:00:00 2001 From: Daniel Weindl Date: Wed, 3 Jul 2024 17:13:23 +0200 Subject: [PATCH] Proposal: Different languages for model specification (#538) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # Motivation There are a number of formats for specifying models in systems biology, each with their specific strengths and weaknesses. PEtab version 1.0.0 only allows Systems Biology Markup Language (SBML) models. While SBML is supported by a large number of tools, there are good reasons to use other formats. For example, rule-based model formats (e.g., BioNetGenLanguage) permit more abstract and compact specification of models based on rules, which are generalisations of reactions. Therefore, and based on user request (#436), we propose to lift PEtab’s restriction to SBML models and allow arbitrary model formats. # Proposed changes * Changes to the PEtab YAML file: * Change `sbml_files` to `models` * `models` entries will be model IDs (following the existing conventions for PEtab IDs) mapping to: * `location`: path / URL to the model * `language`: model format Initial set of model format identifiers (to be extended as needed): * SBML: `sbml` * CellML: `cellml` * BNGL: `bngl` * PySB: `pysb` * An additional entry for mapping tables (see below) is added Example: **Before:** ```yaml format_version: 1 parameter_file: parameters.tsv problems: - condition_files: - conditions.tsv measurement_files: - measurements.tsv observable_files: - observables.tsv sbml_files: - model1.xml ``` **After:** ```yaml format_version: 2.0.0 parameter_file: parameters.tsv problems: - condition_files: - conditions.tsv measurement_files: - measurements.tsv observable_files: - observables.tsv mapping_file: mappings.tsv # optional models: id_for_model1: location: model1.xml language: sbml ``` * Changes to the format of existing tables/files: * Condition/Observable/Parameter Table All symbols that previously referenced the ID of SBML entities, such as parameter IDs or compartment IDs, now refer to (globally unique) named entities in the model, such as parameters, observables, expressions. For example, condition table columns may correspond to parameters, states, species of the referenced model. For species, assignments in the condition table set the initial value at the beginning of the simulation for that condition, potentially replacing the initialization from preequilibration. For all other entities, values are statically replaced at all time points. For entities that assign values to other entities, such as SBML AssignmentRules, the value of the target of that rule is statically replaced at all time points. * Additional files * Mapping Table: Mapping PEtab entity IDs to entity IDs in the model. This optional file may be used to reference model entities in PEtab files where the ID in the model would not be a valid identifier in PEtab (e.g., due to containing blanks, dots, or other special characters). The tsv file has two mandatory columns: `petabEntityId`, `modelEntityId`. Additional columns are allowed. modelEntityIds must be unique identifiers in the model. The mapping table must not map modelEntityIds to petabEntityIds that are also defined in any other part of the PEtab problem. modelEntityId may not refer to other petabEntityIds, including those defined in the mapping table. petabEntityIds defined in the mapping table may be referenced in condition, measurement, parameter and observable tables, but cannot be referenced in the model itself. For example, in SBML, local parameters may be referenced as `$reactionId.$localParameterId`, which are not valid PEtab IDs as they contain a `.` character. Similarly, this table may be used to reference specific species in a BGNL model which may contain many unsupported characters such as `,`, `(` or `.`. However, please note that IDs must exactly match the species names in the BNGL generated network file and no pattern matching will be performed. # Implications * Tools need to check the model format and provide an informative message if the given format cannot be handled * Validators will skip model-dependent validation when encountering unknown model types - ideally there would be some plugin mechanisms to provide validation --- Co-authored by @FFroehlich @fbergmann. Also thanks to everybody participating in these discussions during the last COMBINE meeting. --------- Co-authored-by: FFroehlich Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com> Co-authored-by: Frank T. Bergmann --- doc/_static/petab_schema.yaml | 37 +++++++-- doc/documentation_data_format.rst | 131 ++++++++++++++++++++++-------- 2 files changed, 124 insertions(+), 44 deletions(-) diff --git a/doc/_static/petab_schema.yaml b/doc/_static/petab_schema.yaml index 107e54fd..95316be0 100644 --- a/doc/_static/petab_schema.yaml +++ b/doc/_static/petab_schema.yaml @@ -38,13 +38,26 @@ properties: files and optional visualization files. properties: - sbml_files: - type: array - description: List of PEtab SBML files. - - items: - type: string - description: PEtab SBML file name or URL. + model_files: + type: object + description: One or multiple models + + # the model ID + patternProperties: + "^[a-zA-Z_]\\w*$": + type: object + properties: + location: + type: string + description: Model file name or URL + language: + type: string + description: | + Model language, e.g., 'sbml', 'cellml', 'bngl', 'pysb' + required: + - location + - language + additionalProperties: false measurement_files: type: array @@ -78,8 +91,16 @@ properties: type: string description: PEtab visualization file name or URL. + mapping_files: + type: array + description: List of PEtab mapping files. + + items: + type: string + description: PEtab mapping file name or URL. + required: - - sbml_files + - model_files - observable_files - measurement_files - condition_files diff --git a/doc/documentation_data_format.rst b/doc/documentation_data_format.rst index f4d4272c..79e32368 100644 --- a/doc/documentation_data_format.rst +++ b/doc/documentation_data_format.rst @@ -2,7 +2,7 @@ PEtab data format specification =============================== -Format version: 1 +Format version: 2.0.0 This document explains the PEtab data format. @@ -41,12 +41,11 @@ Overview --------- The PEtab data format specifies a parameter estimation problem using a number -of text-based files (`Systems Biology Markup Language (SBML) `_ -and +of text-based files ( `Tab-Separated Values (TSV) `_) (Figure 2), i.e. -- An SBML model [SBML] +- A model - A measurement file to fit the model to [TSV] @@ -67,6 +66,9 @@ and - (optional) A visualization file, which contains specifications how the data and/or simulations should be plotted by the visualization routines [TSV] +- (optional) A mapping file, which allows mapping PEtab entity IDs to entity + IDs in the model, which might not have valid PEtab IDs themselves [TSV] + .. figure:: gfx/petab_files.png :alt: Files constituting a PEtab problem @@ -91,11 +93,11 @@ problem as such. - Fields in "[]" are optional and may be left empty. -SBML model definition ---------------------- - -The model must be specified as valid SBML. There are no further restrictions. +Model definition +---------------- +PEtab 2.0.0 is agnostic of specific model formats. A model file is referenced +in the PEtab problem description (YAML) via its file name or a URL. Condition table --------------- @@ -107,7 +109,7 @@ different experimental conditions). This is specified as a tab-separated value file in the following way: +--------------+------------------+------------------------------------+-----+---------------------------------------+ -| conditionId | [conditionName] | parameterOrSpeciesOrCompartmentId1 | ... | parameterOrSpeciesOrCompartmentId${n} | +| conditionId | [conditionName] | modelEntityId1 | ... | modelEntityId${n} | +==============+==================+====================================+=====+=======================================+ | STRING | [STRING] | NUMERIC\|STRING | ... | NUMERIC\|STRING | +--------------+------------------+------------------------------------+-----+---------------------------------------+ @@ -140,32 +142,44 @@ Detailed field description Condition names are arbitrary strings to describe the given condition. They may be used for reporting or visualization. -- ``${parameterOrSpeciesOrCompartmentId1}`` - - Further columns may be global parameter IDs, IDs of species or compartments - as defined in the SBML model. Only one column is allowed per ID. - Values for these condition parameters may be provided either as numeric - values, or as IDs defined in the SBML model, the parameter table or both. - - - ``${parameterId}`` - - The values will override any parameter values specified in the model. - - - ``${speciesId}`` - - If a species ID is provided, it is interpreted as the initial - condition of that species (as amount if `hasOnlySubstanceUnits` is set to `True` - for the respective species, as concentration otherwise) and will override the - initial condition given in the SBML model or given by a preequilibration - condition. If no value is provided for a condition, the result of the - preequilibration (or initial condition from the SBML model, if - no preequilibration is defined) is used. - - - ``${compartmentId}`` - - If a compartment ID is provided, it is interpreted as the initial - compartment size. - +- ``${modelEntityId}`` + + Further columns may be the IDs of model entities that have globally unique + IDs, such as parameters, species or compartments defined in the model to set + condition-specific values. Only one column is allowed per ID. + Values for these entities may be provided either as numeric values, or as IDs + of globally unique entity IDs as defined in the model, the mapping table or + the parameter table. + + Any non-``NaN`` value will override the original values of the model, or if + preequilibration was used, they will override the value obtained from + preequilibration. A ``NaN`` value indicates that the original value of the + model is to be used (when used in the preequilibration condition, or in the + simulation condition if no preequilibration is used) or that the result of + preequilibration is to be used (when used in the simulation condition after + preequilibration). + + The value in the condition table either replaces the initial value or the + value at all timepoints based on whether the model entity has a rate law + assigned or not: + + * For model entities that have constant algebraic assignments + (but not necessarily constant values), i.e, that do not have a rate of + change with respect to time assigned and that are not subject to event + assignments, the algebraic assignment is replaced statically at all + timepoints. Examples for such model entities are the targets of SBML + `AssignmentRules`. + + * For all other entities, e.g., those that are assigned by SBML `RateRules`, + only the initial value can be assigned in the condition table. If an + assignment of the rate of change with respect to time or event assignment + is desired, the values of model entities that are used to define rate of + change or event assignments must be assigned in the condition table. + If no such model entities exist, assignment is not possible. + + If the model has a concept of species and a species ID is provided, its + value is interpreted as amount or concentration in the same way as anywhere + else in the model. Measurement table ----------------- @@ -705,6 +719,49 @@ Detailed field description legend and which defaults to the value in ``datasetId``. +Mapping table +------------- + +Mapping PEtab entity IDs to entity IDs in the model. This optional file may be +used to reference model entities in PEtab files where the ID in the model would +not be a valid identifier in PEtab (e.g., due to inclusion of blanks, dots, or +other special characters). + +The TSV file has two mandatory columns, ``petabEntityId`` and +``modelEntityId``. Additional columns are allowed. + ++---------------+---------------+ +| petabEntityId | modelEntityId | ++===============+===============+ +| STRING | STRING | ++---------------+---------------+ +| reaction1_k1 | reaction1.k1 | ++---------------+---------------+ + + +Detailed field description +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- ``petabEntityId`` [STRING, NOT NULL] + + A valid PEtab identifier that is not defined in any other part of the PEtab + problem. This identifier may be referenced in condition, measurement, + parameter and observable tables, but cannot be referenced in the model + itself. + +- ``modelEntityId`` [STRING, NOT NULL] + + A globally unique identifier defined in the model, + *that is not a valid PEtab ID* (see :ref:`identifiers`). + + For example, in SBML, local parameters may be referenced as + ``$reactionId.$localParameterId``, which are not valid PEtab IDs as they + contain a ``.`` character. Similarly, this table may be used to reference + specific species in a BNGL model that may contain many unsupported + characters such as ``,``, ``(`` or ``.``. However, please note that IDs must + exactly match the species names in the BNGL-generated network file, and no + pattern matching will be performed. + Extensions ~~~~~~~~~~ @@ -743,7 +800,7 @@ Parameter estimation problems combining multiple models ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Parameter estimation problems can comprise multiple models. For now, PEtab -allows to specify multiple SBML models with corresponding condition and +allows one to specify multiple models with corresponding condition and measurement tables, and one joint parameter table. This means that the parameter namespace is global. Therefore, parameters with the same ID in different models will be considered identical. @@ -1070,6 +1127,8 @@ float values are demoted to boolean values. For example, in ``1 + true``, the expression is interpreted as ``true && true = true``. +.. _identifiers: + Identifiers -----------