Skip to content

arblang

Nora Abi Akar edited this page Apr 12, 2021 · 2 revisions

Why a new DSL?

  • TODO: Survey the landscape. NMODL is problematic, but are alternatives like NeuroML/LEMS, NestML? Why and how?
  • The currently used DSL, NMODL, doesn't have a very well defined syntax or semantics. It makes it difficult for users to write models and developers to interpret them. It is too general. It provides too many tools and features that don't quite fit the needs of the users, which end up being misused.
  • We need a well-defined, well-documented language that can model most mechanisms elegantly.
  • We need a representation of mechanisms that is amenable to analysis and composition. (Preferably at runtime using an abstract enough level of the IR).

Arblang

Goals

  • Arblang is a proposed DSL that aims to address the shortcomings of NMODL.
  • It is a special-purpose language, with a declarative style.
  • It should be restricted, structured and simple enough that it can be easily written and understood by the users and the compiler.
  • It has well-defined features and is capable of being extended to support new features (that make sense in the field of mechanism descriptions).
  • Arblang's compiler should be simple, extendable and maintainable. It should be capable of performing powerful optimizations on the source code. This can be achieved by using good design principles, having a powerful IR and having a clear separation of concerns.
  • Arblang's compiler pipeline should allow for runtime manipulation of mechanisms, possibly by interfacing with LLVM (this is a longer term goal).
  • Arblang should be flexible enough to be supported by other simulators, and it should be possible to translate Arblang to/from similar DSLs (eg: LEMS, NMODL, SBML)

Key design concerns

  1. Language Level:
    • How to interface with Arbor without using "magic" variables.
    • How to represent ODEs, kinetic schemes, linear systems, synapses.
    • How to define the mechanism state.
    • How to expose the cell state.
    • How to capture state and current updates.
    • How to represent units.
    • How much programming flexibility is exposed to the user.
    • How do we select the correct solutions to the ODEs.
  2. Compiler Level:
    • For now a black box that performs optimizations on a well-defined IR.
    • A potential 'runtime system' and 'standard libraries'
      • Solver routines would live here.
    • At what level of the representation lowering do we expose the ability to compose with other mechanisms.
    • At what level do we translate the kinetic schemes/ODEs/linear systems etc to their analytical solution. Do we use inlined solutions or refer to external solver routines.
  3. Simulator Level:
    • How to interface the generated C++ code with Arbor and other simulators? (Mechanism ABI)
    • How to represent the output of the mechanism updates. (Row-polymorphism? Tuples?)
    • Where to perform reductions.
      • Currently, modcc generates code in the mechanism based on general concerns that writes results directly into memory. Consequently, atomics have to used in certain places.
      • Alternatively, mechanisms could provide pure updates which are combined in the library.
      • Then, the method could be chosen at runtime, aware of synchronisation needs.
      • It would also go a long way towards reproducability (reduction order)
      • Performance might improve (can use optimised libs, esp CUDA), at the cost of memory.
      • Compiler and generated code become simpler
    • How to perform back-end specific optimizations.

The Compiler Pipeline

  1. Parsing and Lexing
    • Text to AST
    • Syntax is checked here
  2. De-Sugaring
    • Remove high-level features
    • Turn Reaction Systems into ODEs
  3. AST transformations
    • Type checking
    • Unit checking
  4. AST -> IR
  5. IR transformations
    • Perform necessary optimisations to be able to do symbolic differentiation

    • Analyse ODE systems

      • Classify ODEs (linear/non-linear/coupled/independent/transcendental)
      • Generate solutions.
      • Find initial states based on steady-state where needed
    • Introduce reductions if they need to be incorporated into mechanism updates

      Mechanism composition and LLVM integration

  6. IR optimisations
  7. Backend selection and specific optimisation
  8. Code Generation

The Language

  • Has Declarative/functional semantics
    • No (general?) recursion.
    • No side effects, all functions are pure.
    • Simple, restricted branching capabilities
    • No direct implicit access to mechanism and cell states.
  • Separation of concerns
    • Separation of mechanism state update and current update.
    • Separation of state update into multiple, clearly defined blocks

Questions

  • Reversal potential
    • How to model this?
  • Is Reduction on the Arbor side feasible?
    • Per Ion, we have
      • current `iX`
      • internal concentration `Xi`
      • external concentration `Xo`
      • (potentially, replace concentrations with a flux and its derivative (this is something Sam talked about))
    • Per CV we have the total concentration `g` and current density `I`
    • How many ions?
      • Ca, Na, K, H, NO (Purkinje) + leak
      • Might have more, so let's say 10
    • If we have two mechanisms updating the same ion, we need to have a separate copies, if not we can get away without (if we have proof).
  • Do we want units
    • if we do it, we must be very strict about it.

Syntax

  • imperative looking language, lest we scare away users
  • every user variable has a unit
    • JUST SI, no more Pound per Square Yard
    • means we just do Order of Magnitude?
    • no user defined units, they will mess up.
    • Q: does this mean we forbid unitless things? No, we cannot.
    • If we do it, Unit errors must be type errors.

A proposal

  1. Basics

    • types start with capital letters

    • functions

      fn foo (x: type, y: type) -> type {
          z
      }
      
      • type annotation are mandatory at arguments and results
      • return is implicitly the last expression
    • conditionals

      if p {
        a
      } else {
        b
      }
      
      • else is mandatory
      • this is an expression that returns a result, such it must be used ie bound
    • variable bindings

      let x = y [in] # scope of x
      # in is optional 
      # bindings and scopes can be surrounded by {}
      
    • comments `#` for single line, no multi line offered (for now)

    • type definitions

      struct Point {
         x: real,
         y: real,
         z: real,
      }
      
      • we only allow simple types here, that reals, bools, tuples, structs?
    • tuples

      (type, type, ...)
      
    • Nonono

      let f = fn () -> real {}
      let t = struct {}
      
    • things we do not have to think about:

      • whether we are pass-by-value or pass-by-ref; we are read-only
    • things we do need to think about

      • do we allow users to return tuples and/or structs? Is there a reason not to?
      • are we exposing too much in the current/update functions
      • range vs unique and constant vs varying variables
        • we do not think we need them in ArbLang
  2. Mechanisms

    • The "StdLib" is going to provide a type

      struct Current {s: Ion, i: real, }
      
      • What is `Ion`? It could be a symbol or a type
    • Then

      # this is not a data type, this is an interface
      mechanism Ih {
        state: IhState
        param: IhParam
        current: current_ih
      }
      
      # here is the implementation
      struct IhState { ... }
      struct IhParam { ... }
      
      fn current_ih (state: IhState, param: IhParam, cell: Cell) -> (Current, Current, Current) {
         ...
        ((Ca, i_ca),
         (K, i_k),
         (Leak, i_l))
      }
      
    • Once more with state

      # this is not a data type, this is an interface
      mechanism HH {
        state: HHState,
        param: HHParam,
        current: current_hh,
        update: update_hh,
      }
      
      # here is the implementation
      struct HHState { n: real, m: real, h: real }
      struct HHParam { ... }
      
      fn current_hh (state: HHState, param: HHParam, cell: Cell) -> (Current, Current, Current) {...}
      
      fn update_hh (state: HHState, param: HHParam, cell: Cell) -> HHState {
          solve_ode((state.m, rhs_m, init_m),
                    (state.h, rhs_h, init_h),
                    (state.n, rhs_n, init_n))
      
      }
      

The DSL Interface

  • A mechanism has
    • a set of parameters
    • a state vector
    • initialisation of state
    • an update mechanism for state
    • a current contribution (i, g)
    • a concentration flux contribution (φ, ...)

Designing Top-Down

Consider the Im mechanism, it is essentially a function of this type

Im: (temperature T, voltage U, parameter gbar, ion K+, state Im) -> (ion K+, state Im)

Making this a bit more abstract by writing a general mechanism M

M: (globals, parameters, ions, state) -> (ions, state)

actually, ions is a less than ideal term, since there might be neutral things around.

M: (parameters, globals, species, state) -> (species, state)

The modelling of M as a single function is also not ideal. The backend needs at least two entry points, currently nrn_current and nrn_state, but these are semantically disjoint, too.

M: parameters -> ((globals, species, state) -> state,
                  (globals, species, state) -> species)

We define the following building blocks

species   = ion
          | neutral
          | leak    -- or non_specific

-- charged particles
ion     = (name, charge, concentration_int, concentration_ext)
-- neutral molecules
neutral = (name,         concentration_int, concentration_ext)
-- non specific current, summarising all leak effects
leak    = ()

-- state of the simulator provided read-only to the mechanisms
globals = (dt, voltage, temperature, ...)

 -- user defined, per mechanism
parameters = (...)
state      = (...)

Also, we note that it is better to return a rate of change instead of a new value and leave integrating that into the new state to the engine. In the voltage update, we're integrating schematically

dv/dt = -L[v] + sum (Imech)

Here, currents can have a voltage dependency, so it can be linearized as

dv/dt = -L[v] + sum (v*gmech(v) + cmech) + O(v^2^)

For stability, we solve this via implicit Euler, so we need to use v(t + dt) in the rhs. Ignoring the effect of higher order terms (rightly or wrongly) we need to know the values g_mech(v) rather than just I_mech(v). Thus, the rate of change is given by the pair (I_mech, g_mech).

Similar, to modify concentrations C, collect dC_mech/dt terms from mechanisms, sum them up, and apply backwards Euler. We would also need to get the partial derivative of these terms with respect to C for stability. Thus, we write state` and species` symbolically for these rates.

M: parameters -> ((globals, species, state) -> state',
                  (globals, species, state) -> species')

state  = (x, y)
state' = (d/dt x, d/dt y)

species  = ion
species' = ((I, g=dU/dI),
            (\phi_int = d/dt C_int, d \phi_int/d C_int),
            (\phi_ext = d/dt C_ext, d \phi_ext/d C_ext))

The engine will then combine state and species updates and integrate over them to obtain new the new simulation state.

Next, the update functions can be decomposed into smaller building blocks

-- density mechanism
state_update: (globals, species, state) -> state'
state_update = ode_system
             | reaction

-- point mechanism
state_update = state_machine

Composition

We define composition rules for mechanisms as follows

  1. State
    • Disjoint mechanisms are updated in arbitrary order
    • Multiple updates to the same state variable are combined by addition of the resulting rates state`
  2. Species
    • All rates are computed in arbitrary order
    • The resulting list is reduced into a single rate per state variable, again without a known order
    • The result is applied to the current state

ODE Systems

AST

A single raw AST-level ODE is an expression of

ODE = (f,            -- Symbol for the function 
       f_0,          -- Initial value
       RHS[f, t])    -- Functional of f and t, arbitrary expression

which should be understood as the equivalent of f'(t) = RHS[f, t] and f(t=0) = f_0. A system of potentially coupled ODEs is represented as a mutually recursive set of ODE objects

ODE-System: (ODE, ...)

Remarks

  • We assume f to be a function of time t only.
  • The initial value f_0 is given directly as a value or indirectly by a keyword requesting a steady state solution. The latter is needed for kinetic schemes.
  • Non-linear and transcendental expression in RHS are fine.
  • We do not ask the user say how the solution to a system is to be found. Instead we perform best-effort analysis and solution. Obviously, the user is to be informed about our choice.

Future extensions

  • n'th order ODEs which would have a initial representation similar to
ODE-n = (f^(n), 
         (f^(n)_0, 
          f^(n-1), 
          ...), 
         RHS[f, t])

and would de-sugar further to a coupled system of 1st order ODEs.

  • Radial diffusion problems might warrant allowing f(t, r). However, radial diffusion is likely to be excluded from Arblang and delegated to the simulator.
  • Linear systems can be more efficiently written as matrix x vector

Reaction Systems

Syntax

  • Reactions are usually written as

    c_A A + ... <-> c_X X + ...
    
  • or equivalently

    0 = c_X X + ... - c_A A - ...
    
  • the reactants A are considered mass fractions

  • implicitly A + ... + X + ... = 1 due to conservation of mass

  • Do we need something else?

  • We approximate this by ASCII notation

    cA*A + ... -> cX*X + ...
    
  • the symbols ->, <-, and <-> are allowed, where the latter implies a bi-directional reaction

    cA*A + ... <-> cX*X + ...
    

    is the same as

    cA*A + ... -> cX*X + ...
    cX*X + ... -> cA*A + ...