arblang

Why a new DSL?

TODO: Survey the landscape. NMODL is problematic, but are alternatives like NeuroML/LEMS, NestML? Why and how?
The currently used DSL, NMODL, doesn't have a very well defined syntax or semantics. It makes it difficult for users to write models and developers to interpret them. It is too general. It provides too many tools and features that don't quite fit the needs of the users, which end up being misused.
We need a well-defined, well-documented language that can model most mechanisms elegantly.
We need a representation of mechanisms that is amenable to analysis and composition. (Preferably at runtime using an abstract enough level of the IR).

Arblang

Goals

Arblang is a proposed DSL that aims to address the shortcomings of NMODL.
It is a special-purpose language, with a declarative style.
It should be restricted, structured and simple enough that it can be easily written and understood by the users and the compiler.
It has well-defined features and is capable of being extended to support new features (that make sense in the field of mechanism descriptions).
Arblang's compiler should be simple, extendable and maintainable. It should be capable of performing powerful optimizations on the source code. This can be achieved by using good design principles, having a powerful IR and having a clear separation of concerns.
Arblang's compiler pipeline should allow for runtime manipulation of mechanisms, possibly by interfacing with LLVM (this is a longer term goal).
Arblang should be flexible enough to be supported by other simulators, and it should be possible to translate Arblang to/from similar DSLs (eg: LEMS, NMODL, SBML)

Key design concerns

Language Level:
- How to interface with Arbor without using "magic" variables.
- How to represent ODEs, kinetic schemes, linear systems, synapses.
- How to define the mechanism state.
- How to expose the cell state.
- How to capture state and current updates.
- How to represent units.
- How much programming flexibility is exposed to the user.
- How do we select the correct solutions to the ODEs.
Compiler Level:
- For now a black box that performs optimizations on a well-defined IR.
- A potential 'runtime system' and 'standard libraries'
  - Solver routines would live here.
- At what level of the representation lowering do we expose the ability to compose with other mechanisms.
- At what level do we translate the kinetic schemes/ODEs/linear systems etc to their analytical solution. Do we use inlined solutions or refer to external solver routines.
Simulator Level:
- How to interface the generated C++ code with Arbor and other simulators? (Mechanism ABI)
- How to represent the output of the mechanism updates. (Row-polymorphism? Tuples?)
- Where to perform reductions.
  - Currently, modcc generates code in the mechanism based on general concerns that writes results directly into memory. Consequently, atomics have to used in certain places.
  - Alternatively, mechanisms could provide pure updates which are combined in the library.
  - Then, the method could be chosen at runtime, aware of synchronisation needs.
  - It would also go a long way towards reproducability (reduction order)
  - Performance might improve (can use optimised libs, esp CUDA), at the cost of memory.
  - Compiler and generated code become simpler
- How to perform back-end specific optimizations.

The Compiler Pipeline

Parsing and Lexing
- Text to AST
- Syntax is checked here
De-Sugaring
- Remove high-level features
- Turn Reaction Systems into ODEs
AST transformations
- Type checking
- Unit checking
AST -> IR
IR transformations
- Perform necessary optimisations to be able to do symbolic differentiation
- Analyse ODE systems
  - Classify ODEs (linear/non-linear/coupled/independent/transcendental)
  - Generate solutions.
  - Find initial states based on steady-state where needed
- Introduce reductions if they need to be incorporated into mechanism updates
  
  Mechanism composition and LLVM integration
IR optimisations
Backend selection and specific optimisation
Code Generation

The Language

Has Declarative/functional semantics
- No (general?) recursion.
- No side effects, all functions are pure.
- Simple, restricted branching capabilities
- No direct implicit access to mechanism and cell states.
Separation of concerns
- Separation of mechanism state update and current update.
- Separation of state update into multiple, clearly defined blocks

Questions

Reversal potential
- How to model this?
Is Reduction on the Arbor side feasible?
- Per Ion, we have
  - current `iX`
  - internal concentration `Xi`
  - external concentration `Xo`
  - (potentially, replace concentrations with a flux and its derivative (this is something Sam talked about))
- Per CV we have the total concentration `g` and current density `I`
- How many ions?
  - Ca, Na, K, H, NO (Purkinje) + leak
  - Might have more, so let's say 10
- If we have two mechanisms updating the same ion, we need to have a separate copies, if not we can get away without (if we have proof).
Do we want units
- if we do it, we must be very strict about it.

Syntax

imperative looking language, lest we scare away users
every user variable has a unit
- JUST SI, no more Pound per Square Yard
- means we just do Order of Magnitude?
- no user defined units, they will mess up.
- Q: does this mean we forbid unitless things? No, we cannot.
- If we do it, Unit errors must be type errors.

A proposal

Basics
- types start with capital letters
- functions
```
fn foo (x: type, y: type) -> type {
    z
}
```
  - type annotation are mandatory at arguments and results
  - return is implicitly the last expression
- conditionals
```
if p {
  a
} else {
  b
}
```
  - else is mandatory
  - this is an expression that returns a result, such it must be used ie bound
- variable bindings
```
let x = y [in] # scope of x
# in is optional 
# bindings and scopes can be surrounded by {}
```
- comments `#` for single line, no multi line offered (for now)
- type definitions
```
struct Point {
   x: real,
   y: real,
   z: real,
}
```
  - we only allow simple types here, that reals, bools, tuples, structs?
- tuples
```
(type, type, ...)
```
- Nonono
```
let f = fn () -> real {}
let t = struct {}
```
- things we do not have to think about:
  - whether we are pass-by-value or pass-by-ref; we are read-only
- things we do need to think about
  - do we allow users to return tuples and/or structs? Is there a reason not to?
  - are we exposing too much in the current/update functions
  - range vs unique and constant vs varying variables
    - we do not think we need them in ArbLang

Mechanisms

The "StdLib" is going to provide a type
```
struct Current {s: Ion, i: real, }
```
- What is `Ion`? It could be a symbol or a type

Then

# this is not a data type, this is an interface
mechanism Ih {
  state: IhState
  param: IhParam
  current: current_ih
}

# here is the implementation
struct IhState { ... }
struct IhParam { ... }

fn current_ih (state: IhState, param: IhParam, cell: Cell) -> (Current, Current, Current) {
   ...
  ((Ca, i_ca),
   (K, i_k),
   (Leak, i_l))
}

Once more with state

# this is not a data type, this is an interface
mechanism HH {
  state: HHState,
  param: HHParam,
  current: current_hh,
  update: update_hh,
}

# here is the implementation
struct HHState { n: real, m: real, h: real }
struct HHParam { ... }

fn current_hh (state: HHState, param: HHParam, cell: Cell) -> (Current, Current, Current) {...}

fn update_hh (state: HHState, param: HHParam, cell: Cell) -> HHState {
    solve_ode((state.m, rhs_m, init_m),
              (state.h, rhs_h, init_h),
              (state.n, rhs_n, init_n))

}

The DSL Interface

A mechanism has
- a set of parameters
- a state vector
- initialisation of state
- an update mechanism for state
- a current contribution (i, g)
- a concentration flux contribution (φ, ...)

Designing Top-Down

Consider the Im mechanism, it is essentially a function of this type

Im: (temperature T, voltage U, parameter gbar, ion K+, state Im) -> (ion K+, state Im)

Making this a bit more abstract by writing a general mechanism M

M: (globals, parameters, ions, state) -> (ions, state)

actually, ions is a less than ideal term, since there might be neutral things around.

M: (parameters, globals, species, state) -> (species, state)

The modelling of M as a single function is also not ideal. The backend needs at least two entry points, currently nrn_current and nrn_state, but these are semantically disjoint, too.

M: parameters -> ((globals, species, state) -> state,
                  (globals, species, state) -> species)

We define the following building blocks

species   = ion
          | neutral
          | leak    -- or non_specific

-- charged particles
ion     = (name, charge, concentration_int, concentration_ext)
-- neutral molecules
neutral = (name,         concentration_int, concentration_ext)
-- non specific current, summarising all leak effects
leak    = ()

-- state of the simulator provided read-only to the mechanisms
globals = (dt, voltage, temperature, ...)

 -- user defined, per mechanism
parameters = (...)
state      = (...)

Also, we note that it is better to return a rate of change instead of a new value and leave integrating that into the new state to the engine. In the voltage update, we're integrating schematically

dv/dt = -L[v] + sum (I~~mech~~)

Here, currents can have a voltage dependency, so it can be linearized as

dv/dt = -L[v] + sum (v*g~~mech~~(v) + c~~mech~~) + O(v^2^)

For stability, we solve this via implicit Euler, so we need to use v(t + dt) in the rhs. Ignoring the effect of higher order terms (rightly or wrongly) we need to know the values g_mech(v) rather than just I_mech(v). Thus, the rate of change is given by the pair (I_mech, g_mech).

Similar, to modify concentrations C, collect dC_mech/dt terms from mechanisms, sum them up, and apply backwards Euler. We would also need to get the partial derivative of these terms with respect to C for stability. Thus, we write state` and species` symbolically for these rates.

M: parameters -> ((globals, species, state) -> state',
                  (globals, species, state) -> species')

state  = (x, y)
state' = (d/dt x, d/dt y)

species  = ion
species' = ((I, g=dU/dI),
            (\phi_int = d/dt C_int, d \phi_int/d C_int),
            (\phi_ext = d/dt C_ext, d \phi_ext/d C_ext))

The engine will then combine state and species updates and integrate over them to obtain new the new simulation state.

Next, the update functions can be decomposed into smaller building blocks

-- density mechanism
state_update: (globals, species, state) -> state'
state_update = ode_system
             | reaction

-- point mechanism
state_update = state_machine

Composition

We define composition rules for mechanisms as follows

State
- Disjoint mechanisms are updated in arbitrary order
- Multiple updates to the same state variable are combined by addition of the resulting rates state`
Species
- All rates are computed in arbitrary order
- The resulting list is reduced into a single rate per state variable, again without a known order
- The result is applied to the current state

ODE Systems

AST

A single raw AST-level ODE is an expression of

ODE = (f,            -- Symbol for the function 
       f_0,          -- Initial value
       RHS[f, t])    -- Functional of f and t, arbitrary expression

which should be understood as the equivalent of f'(t) = RHS[f, t] and f(t=0) = f_0. A system of potentially coupled ODEs is represented as a mutually recursive set of ODE objects

ODE-System: (ODE, ...)

Remarks

We assume f to be a function of time t only.
The initial value f_0 is given directly as a value or indirectly by a keyword requesting a steady state solution. The latter is needed for kinetic schemes.
Non-linear and transcendental expression in RHS are fine.
We do not ask the user say how the solution to a system is to be found. Instead we perform best-effort analysis and solution. Obviously, the user is to be informed about our choice.

Future extensions

n'th order ODEs which would have a initial representation similar to

ODE-n = (f^(n), 
         (f^(n)_0, 
          f^(n-1), 
          ...), 
         RHS[f, t])

and would de-sugar further to a coupled system of 1st order ODEs.

Radial diffusion problems might warrant allowing f(t, r). However, radial diffusion is likely to be excluded from Arblang and delegated to the simulator.
Linear systems can be more efficiently written as matrix x vector

Reaction Systems

Syntax

Reactions are usually written as
```
c_A A + ... <-> c_X X + ...
```
or equivalently
```
0 = c_X X + ... - c_A A - ...
```
the reactants A are considered mass fractions
implicitly A + ... + X + ... = 1 due to conservation of mass
Do we need something else?
We approximate this by ASCII notation
```
cA*A + ... -> cX*X + ...
```
the symbols ->, <-, and <-> are allowed, where the latter implies a bi-directional reaction
```
cA*A + ... <-> cX*X + ...
```
is the same as
```
cA*A + ... -> cX*X + ...
cX*X + ... -> cA*A + ...
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arblang

Why a new DSL?

Arblang

Goals

Key design concerns

The Compiler Pipeline

The Language

Questions

Syntax

A proposal

The DSL Interface

Designing Top-Down

Composition

ODE Systems

AST

Reaction Systems

Syntax

Clone this wiki locally