We use Systems Biology tables (SBtab) because this format can be extended to contain additional information more easily (such as experimental data, and conditions under which data was measured).
To make a conversion feasible, we decided on a set of columns and
tables (some specified by TableName
some also by TableType
according to the official specification) which have to be present for
the conversion to work.
In contrast to the official
documentation,
we need all tables to be kept in their own respective .tsv
file (not
all tables in one huge tsv file), or different sheets of the same
ods
file (the official SBtab project uses .xlsx
).
We don't use any of the code from the original SBtab authors.
The following Sections have more information on specific tables and
their required columns (in addition to the obvious !ID
and !Name
columns).
Sometimes, spreadsheet software introduces non-ascii Unicode characters such as
−
('MINUS SIGN' U+2212, in html this is «−») into the
document. They should be replaced by the ascii character -
('HYPHEN-MINUS' U+002D) after export to text-based formats. And
similarly for other Unicode characters, unless they appear in comments
(or generally unparsed content). Minus and Hyphen can look quite
similar: −-
(depending on the chosedn font), but the ascii hyphen is the
character that programming languages understand as subtraction.
You can check your files for non-ascii characters like this:
grep -P '[^[:ascii:]]' *.tsv
#OR
grep -n '[^a-z_A-Z[:digit:][:punct:][:space:]]' *.tsv
# automatic minus to hyphen replacement:
sed -i 's/−/-/g' *.tsv
Neither grep
nor egrep
define the [:ascii:]
character class
without the -P
option for perl regular expressions. If such
characters appear outside of Formulas and IDs, they may be harmless.
Of course you can use perl directly, or anything else that has regular expressions.
All tables require a unique !ID
column (the ID can be seen as a key
for associative arrays aka dictionaries or hash tables). The
!Name
column must be unique as well and the entries should work as
variable names in the language that you plan to convert the model to
(in some formats the rules for Name
entries are more lenient than
ID
strings). We see no good reason to have both IDs and Names, so
using the same string for both is fine.
The script in this repository uses the make.names()
function on this
column. This will make them unique, but almost certainly break the
model, the reason is that human error will lead to infromative error
messages later on instead of wrong simulation results. Using the same
ID multiple times may otherwise go unnoticed and merge two distinct
entities (species/parameters/erc.). This will also replace .
with
_
in all names (dots are allowed in R variable names, but often
illegal in other languages).
Many numbers can be given in a specified scale (like log
), these
numbers will be converted to linear scale when a model file is written
to file.
Let a quantity y
be measured in unit M
(y
is a number
followed by a unit, y/M
is just a number), and !Scale
be set to
log10
, then the number you write in the ![Default]Value
column is
z=log10(y/M)
. The script will do the inverse to generate the model
and pass the unit on to .mod
files. Here are some examples:
z | y | scale indicator |
---|---|---|
-5 |
10¯⁵ | log10 |
1.0E-5 | base-10 logarithm |
|
--- | --- | --- |
1.2 |
3.32 | log |
3.32 | ln |
|
3.32 | natural logarithm |
|
--- | --- | --- |
1.6 |
1.6 | lin |
1.6 | linear |
This table defines the compounds that are supposed to be modeled by state variables and are subject to change by the reactions in the systems.
Column | Values | Comment |
---|---|---|
!Scale | log, log10, linear | and some variants of these |
!InitialValue | a number | (per unit) in the above scale |
!Unit | the unit of the above number | as it would be in linear scale |
!SteadyState | TRUE |
this compound should reach a steady state in at least one scenario and you want to know whether this happened |
FALSE |
it is not important whether or not this compound reaches steady state | |
!Assignment | Name or ID |
this field will assign a pre-defined algebraic expression to the compound |
The conversion script will make a file called
[…]SuggestedOutput.tsv
, it will have lines that can be used to check
whether a compound has reached steady state (or not), this is done for
each compound that has !SteadyState
marked as TRUE
. If that output
is close to 0
, then steady state was reached (it's the sum of all
fluxes for the compound in question).
Others columns are unused but may be informative to the user, or others.
In some cases, a compound's amount or concentration is not supposed to
be governed by reactions (kinetic laws, stoichiometry) but rather by a
fixed (time-dependent) value. In SBML this is called a boundary
condition. If the species is supposed to be constant, the field
!IsConstant
can be set to TRUE
; otherwise, you can assign the
value of an expression, listed in the Expression
table to this
compound. The !Assignment
field can contain the name of an
Expression
. In SBML, a rule will be created in the listOfRules
,
that rule will target the boundary condition species:
<species id="PKC_active"
name="PKC_active_value"
compartment="Comp1"
initialConcentration="0"
substanceUnits="substance"
boundaryCondition="true"/>
<!-- ......... -->
<!-- and later -->
<!-- ......... -->
<assignmentRule variable="PKC_active">
<math xmlns="http://www.w3.org/1998/Math/MathML">
<apply>
<plus/>
<ci> PKC_DAG_AA_p </ci>
<ci> PKC_Ca_memb_p </ci>
<ci> PKC_Ca_AA_p </ci>
<ci> PKC_DAG_memb_p </ci>
<ci> PKC_basal_p </ci>
<ci> PKC_AA_p </ci>
</apply>
</math>
</assignmentRule>
In the other formats, vf
and mod
, the relationship is much simpler:
<Expression Name="PKC_active_value" Description="defined expression Ex0" Formula="PKC_DAG_AA_p+PKC_Ca_memb_p+PKC_Ca_AA_p+PKC_DAG_memb_p+PKC_basal_p+PKC_AA_p"/>
<Expression Name="PKC_active" Description="defined expression S11" Formula="PKC_active_value"/>
which will lead to code such as:
PKC_active_value = PKC_DAG_AA_p+PKC_Ca_memb_p+PKC_Ca_AA_p+PKC_DAG_memb_p+PKC_basal_p+PKC_AA_p;
PKC_active = PKC_active_value;
A blank cell, NONE
, FALSE
, or NO
means that there is no
Assignment for this species/compound. Empty cells can be tricky if export or
import functions merge multiple delimiters (so \t\t
is not
recognized as an empty cell).
A TRUE
value in !IsConstant
and meaningful assignments are
mutually exclusive and may lead to weird results.
Column | Values | Comment |
---|---|---|
!Scale |
log , log10 , linear |
some aliases of these are possible (such as base-10 logarithm ) |
!DefaultValue |
a number | in above scale, normalized to the unit of measurement, possibly subject to fitting/sampling |
!Std |
a number | standard deviation / uncertainty of this parameter |
!Min and !Max |
numbers | respectively, used if !Std is not present |
The columns !Std
and !Min
/!Max
are only used in
sampling/optimization, the model conversion is unaffected by them, the
DefaultValue is passed on to the model files (if there is a place to
put them).
The column !ReactionFormula
determines the stoichiometry of the
model, the !KineticLaw
column determines the flux of the given
reaction. Both er required and are standard columns in SBtab.
Column | Values | Comment |
---|---|---|
!KineticLaw | e.g. kf*A*B-kr*AB |
the flux, as a math expression |
!ReactionFormula | e.g. A+B<=>AB |
so, AB will increase and both A and B will decrease by this reaction whenever the flux is positive |
Since the kinetic law determines the reversibility of the reaction, the
column !IsReversible
is not necessary, but if you determine the
kinetics based on the law of mass action it may be important for you
to have that column as a reminder (for when you are auto generating
the !KineticLaw
column, which this script doesn't do).
The input parameters to the model that distinguish different experiments. These quantities are known and can be influenced by the people who are performing an experiment (or rather the real counterparts of these quantities can be influenced). These play the roles of (additional) parameters, but a different kind of parameter than in the Parameter table. Experiments are supposed to have the same parameters of the normal kind and different parameters of the input kind.
The !DefaultValue
column serves the same role as with normal
parameters, but these are set to known values during an experiment
(these have to be known values).
The outputs are observable quantities of this system; what is and isn't an output depends on what you can measure (or have knowledge about). Outputs are usually converted to functions in the target language. Experimental Data and Outputs are intimately related as the outputs are the model's equivalent of the data and in some way those can be compared to one another.
It is possible to include the measured data in other sheets, that data should be
stored together with an estimate of the measurement noise
levels. Regardless of the nature of the noise and underlying
distributions we use !ErrorName
to indicate which column (the one
that has this name) is storing information about this measurement
error.
Column | Values | Comment |
---|---|---|
!ErrorName |
a string | indicates the column in data sheets that hold the measurement error of an observable |
!ErrorType |
not used | this is for the user |
!ProbDist |
a string | the probability distribution of the noise model (currently unused); this is for humans to read |
!Formula |
a math expression | the right hand side of the assignment |
Many data columns may share the same Error column. This is useful if you have only a very rough estimate of the noise anyway, and the outputs are in the same number range so using the same standard deviation (etc.) for all data points seems good enough.
The data sheets are not used by sbtab_to_vfgen()
, but they are used for Parameter Estimation, e.g. via
MCMC.
These will be local variables of the model. These variables store a one line mathematical expression for reusability of the resulting value.
Expressions are assignments that are calculated repeatedly each time the ODEs right hand side is called (before fluxes are calculated).
The !Formula
column stores a string math expression that will serve
as the right hand side of the assignment.
This table holds the mapping between input parameters and data sheets. It determines the conditions under which a data set should be replicated using the model. The conditions als include events that need to happen to replicate an experiment.
Column | Values | Comment |
---|---|---|
!Type |
Time␣Series |
indicates that the data is a t->output mapping |
Dose␣Response |
data sheet is an input/output curve, i.e. input->output mapping |
|
>some_id |
a number | sets the input parameters for this experiment |
!Event |
a table name | the name of an event table that holds time instantaneous model state changes |
This is not used by this converter, but useful for parameter fitting and interpretation of the input and output concepts.
Events are instantaneous changes in state variables or inputs at speciefied points in time (events with complex triggers are not supported by any f our code yet).
Eventtables have a !Time
column and an effect column, where the
header combines a target and operation: >OPERATION:TARGET
. An example:
!TimePoint | !Time | >ADD:Ca |
---|---|---|
TP0 | 100 | 1e3 |
TP0 | 200 | 1e3 |
TP0 | 300 | 1e3 |
In this example, we add 1000 units of Ca, every 100 time-units. The
possible operations are: SET
,ADD
,SUB
,MUL
,DIV
.