-
Notifications
You must be signed in to change notification settings - Fork 61
arblang
- TODO: Survey the landscape. NMODL is problematic, but are alternatives like NeuroML/LEMS, NestML? Why and how?
- The currently used DSL, NMODL, doesn't have a very well defined syntax or semantics. It makes it difficult for users to write models and developers to interpret them. It is too general. It provides too many tools and features that don't quite fit the needs of the users, which end up being misused.
- We need a well-defined, well-documented language that can model most mechanisms elegantly.
- We need a representation of mechanisms that is amenable to analysis and composition. (Preferably at runtime using an abstract enough level of the IR).
- Arblang is a proposed DSL that aims to address the shortcomings of NMODL.
- It is a special-purpose language, with a declarative style.
- It should be restricted, structured and simple enough that it can be easily written and understood by the users and the compiler.
- It has well-defined features and is capable of being extended to support new features (that make sense in the field of mechanism descriptions).
- Arblang's compiler should be simple, extendable and maintainable. It should be capable of performing powerful optimizations on the source code. This can be achieved by using good design principles, having a powerful IR and having a clear separation of concerns.
- Arblang's compiler pipeline should allow for runtime manipulation of mechanisms, possibly by interfacing with LLVM (this is a longer term goal).
- Arblang should be flexible enough to be supported by other simulators, and it should be possible to translate Arblang to/from similar DSLs (eg: LEMS, NMODL, SBML)
- Language Level:
- How to interface with Arbor without using "magic" variables.
- How to represent ODEs, kinetic schemes, linear systems, synapses.
- How to define the mechanism state.
- How to expose the cell state.
- How to capture state and current updates.
- How to represent units.
- How much programming flexibility is exposed to the user.
- How do we select the correct solutions to the ODEs.
- Compiler Level:
- For now a black box that performs optimizations on a well-defined IR.
- A potential 'runtime system' and 'standard libraries'
- Solver routines would live here.
- At what level of the representation lowering do we expose the ability to compose with other mechanisms.
- At what level do we translate the kinetic schemes/ODEs/linear systems etc to their analytical solution. Do we use inlined solutions or refer to external solver routines.
- Simulator Level:
- How to interface the generated C++ code with Arbor and other simulators? (Mechanism ABI)
- How to represent the output of the mechanism updates. (Row-polymorphism? Tuples?)
- Where to perform reductions.
- Currently, modcc generates code in the mechanism based on general concerns that writes results directly into memory. Consequently, atomics have to used in certain places.
- Alternatively, mechanisms could provide pure updates which are combined in the library.
- Then, the method could be chosen at runtime, aware of synchronisation needs.
- It would also go a long way towards reproducability (reduction order)
- Performance might improve (can use optimised libs, esp CUDA), at the cost of memory.
- Compiler and generated code become simpler
- How to perform back-end specific optimizations.
- Parsing and Lexing
- Text to AST
- Syntax is checked here
- De-Sugaring
- Remove high-level features
- Turn Reaction Systems into ODEs
- AST transformations
- Type checking
- Unit checking
- AST -> IR
- IR transformations
-
Perform necessary optimisations to be able to do symbolic differentiation
-
Analyse ODE systems
- Classify ODEs (linear/non-linear/coupled/independent/transcendental)
- Generate solutions.
- Find initial states based on steady-state where needed
-
Introduce reductions if they need to be incorporated into mechanism updates
Mechanism composition and LLVM integration
-
- IR optimisations
- Backend selection and specific optimisation
- Code Generation
- Has Declarative/functional semantics
- No (general?) recursion.
- No side effects, all functions are pure.
- Simple, restricted branching capabilities
- No direct implicit access to mechanism and cell states.
- Separation of concerns
- Separation of mechanism state update and current update.
- Separation of state update into multiple, clearly defined blocks
- Reversal potential
- How to model this?
- Is Reduction on the Arbor side feasible?
- Per Ion, we have
- current `iX`
- internal concentration `Xi`
- external concentration `Xo`
- (potentially, replace concentrations with a flux and its derivative (this is something Sam talked about))
- Per CV we have the total concentration `g` and current density `I`
- How many ions?
- Ca, Na, K, H, NO (Purkinje) + leak
- Might have more, so let's say 10
- If we have two mechanisms updating the same ion, we need to have a separate copies, if not we can get away without (if we have proof).
- Per Ion, we have
- Do we want units
- if we do it, we must be very strict about it.
- imperative looking language, lest we scare away users
- every user variable has a unit
- JUST SI, no more Pound per Square Yard
- means we just do Order of Magnitude?
- no user defined units, they will mess up.
- Q: does this mean we forbid unitless things? No, we cannot.
- If we do it, Unit errors must be type errors.
-
Basics
-
types start with capital letters
-
functions
fn foo (x: type, y: type) -> type { z }
- type annotation are mandatory at arguments and results
- return is implicitly the last expression
-
conditionals
if p { a } else { b }
- else is mandatory
- this is an expression that returns a result, such it must be used ie bound
-
variable bindings
let x = y [in] # scope of x # in is optional # bindings and scopes can be surrounded by {}
-
comments `#` for single line, no multi line offered (for now)
-
type definitions
struct Point { x: real, y: real, z: real, }
- we only allow simple types here, that reals, bools, tuples, structs?
-
tuples
(type, type, ...)
-
Nonono
let f = fn () -> real {} let t = struct {}
-
things we do not have to think about:
- whether we are pass-by-value or pass-by-ref; we are read-only
-
things we do need to think about
- do we allow users to return tuples and/or structs? Is there a reason not to?
- are we exposing too much in the current/update functions
- range vs unique and constant vs varying variables
- we do not think we need them in ArbLang
-
-
Mechanisms
-
The "StdLib" is going to provide a type
struct Current {s: Ion, i: real, }
- What is `Ion`? It could be a symbol or a type
-
Then
# this is not a data type, this is an interface mechanism Ih { state: IhState param: IhParam current: current_ih } # here is the implementation struct IhState { ... } struct IhParam { ... } fn current_ih (state: IhState, param: IhParam, cell: Cell) -> (Current, Current, Current) { ... ((Ca, i_ca), (K, i_k), (Leak, i_l)) }
-
Once more with state
# this is not a data type, this is an interface mechanism HH { state: HHState, param: HHParam, current: current_hh, update: update_hh, } # here is the implementation struct HHState { n: real, m: real, h: real } struct HHParam { ... } fn current_hh (state: HHState, param: HHParam, cell: Cell) -> (Current, Current, Current) {...} fn update_hh (state: HHState, param: HHParam, cell: Cell) -> HHState { solve_ode((state.m, rhs_m, init_m), (state.h, rhs_h, init_h), (state.n, rhs_n, init_n)) }
-
- A mechanism has
- a set of parameters
- a state vector
- initialisation of state
- an update mechanism for state
- a current contribution (i, g)
- a concentration flux contribution (φ, ...)
Consider the Im mechanism, it is essentially a function of this type
Im: (temperature T, voltage U, parameter gbar, ion K+, state Im) -> (ion K+, state Im)
Making this a bit more abstract by writing a general mechanism M
M: (globals, parameters, ions, state) -> (ions, state)
actually, ions
is a less than ideal term, since there might be neutral
things around.
M: (parameters, globals, species, state) -> (species, state)
The modelling of M
as a single function is also not ideal. The backend
needs at least two entry points, currently nrn_current
and
nrn_state
, but these are semantically disjoint, too.
M: parameters -> ((globals, species, state) -> state,
(globals, species, state) -> species)
We define the following building blocks
species = ion
| neutral
| leak -- or non_specific
-- charged particles
ion = (name, charge, concentration_int, concentration_ext)
-- neutral molecules
neutral = (name, concentration_int, concentration_ext)
-- non specific current, summarising all leak effects
leak = ()
-- state of the simulator provided read-only to the mechanisms
globals = (dt, voltage, temperature, ...)
-- user defined, per mechanism
parameters = (...)
state = (...)
Also, we note that it is better to return a rate of change instead of a new value and leave integrating that into the new state to the engine. In the voltage update, we're integrating schematically
dv/dt = -L[v] + sum (I
mech)
Here, currents can have a voltage dependency, so it can be linearized as
dv/dt = -L[v] + sum (v*g
mech(v) + cmech) + O(v^2^)
For stability, we solve this via implicit Euler, so we need to use
v(t + dt)
in the rhs. Ignoring the effect of higher order terms
(rightly or wrongly) we need to know the values g_mech(v)
rather than
just I_mech(v)
. Thus, the rate of change is given by the pair
(I_mech, g_mech)
.
Similar, to modify concentrations C
, collect dC_mech/dt
terms from
mechanisms, sum them up, and apply backwards Euler. We would also need
to get the partial derivative of these terms with respect to C
for
stability. Thus, we write state`
and species`
symbolically
for these rates.
M: parameters -> ((globals, species, state) -> state',
(globals, species, state) -> species')
state = (x, y)
state' = (d/dt x, d/dt y)
species = ion
species' = ((I, g=dU/dI),
(\phi_int = d/dt C_int, d \phi_int/d C_int),
(\phi_ext = d/dt C_ext, d \phi_ext/d C_ext))
The engine will then combine state and species updates and integrate over them to obtain new the new simulation state.
Next, the update functions can be decomposed into smaller building blocks
-- density mechanism
state_update: (globals, species, state) -> state'
state_update = ode_system
| reaction
-- point mechanism
state_update = state_machine
We define composition rules for mechanisms as follows
- State
- Disjoint mechanisms are updated in arbitrary order
- Multiple updates to the same state variable are combined by
addition of the resulting rates
state`
- Species
- All rates are computed in arbitrary order
- The resulting list is reduced into a single rate per state variable, again without a known order
- The result is applied to the current state
A single raw AST-level ODE is an expression of
ODE = (f, -- Symbol for the function
f_0, -- Initial value
RHS[f, t]) -- Functional of f and t, arbitrary expression
which should be understood as the equivalent of f'(t) = RHS[f, t]
and
f(t=0) = f_0
. A system of potentially coupled ODEs is represented as a
mutually recursive set of ODE objects
ODE-System: (ODE, ...)
Remarks
- We assume
f
to be a function of timet
only. - The initial value
f_0
is given directly as a value or indirectly by a keyword requesting a steady state solution. The latter is needed for kinetic schemes. - Non-linear and transcendental expression in RHS are fine.
- We do not ask the user say how the solution to a system is to be found. Instead we perform best-effort analysis and solution. Obviously, the user is to be informed about our choice.
Future extensions
- n'th order ODEs which would have a initial representation similar to
ODE-n = (f^(n),
(f^(n)_0,
f^(n-1),
...),
RHS[f, t])
and would de-sugar further to a coupled system of 1st order ODEs.
- Radial diffusion problems might warrant allowing
f(t, r)
. However, radial diffusion is likely to be excluded from Arblang and delegated to the simulator. - Linear systems can be more efficiently written as matrix x vector
-
Reactions are usually written as
c_A A + ... <-> c_X X + ...
-
or equivalently
0 = c_X X + ... - c_A A - ...
-
the reactants
A
are considered mass fractions -
implicitly
A + ... + X + ... = 1
due to conservation of mass -
Do we need something else?
-
We approximate this by ASCII notation
cA*A + ... -> cX*X + ...
-
the symbols
->
,<-
, and<->
are allowed, where the latter implies a bi-directional reactioncA*A + ... <-> cX*X + ...
is the same as
cA*A + ... -> cX*X + ... cX*X + ... -> cA*A + ...