You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While we hope S-expressions are going to be rare on themselves in the kernel, they will exist, and when they do, they're likely to be the bottleneck.
Therefore, let's not be naive about implementing them.
The representation of variables is (de Bruijn indices/levels VS scoped IDs VS globally unique IDs) orthogonal and will have its own issue.
The question about left or right associative representation is irrelevant to this discussion.
Implementation options
Traditional
pubenumExpr{Var(usize),App(Box<Expr>,Box<Expr>),}
What you'd expect modulo the pointer type chosen to be Box here.
Prolog-style
(= (parent $y Bob) $rhs) is [2 =; 2 parent; 0 $y; 0 Bob; 0 $rhs]
Assuming arities are disjoint from vars, and symbols and vars are interchangeable: (a (b c) (d e f g h) i)
becomes [4 a 2 b c 5 d e f g h 1 i]
yielding Expr = *u64.
This maintains the nice property that subexpressions are fractal (have the same type and are free).
It does not need allocations or pointer indirections.
E.g. The (b c) subexpression is just the base pointer plus two, and the word the pointer points to tells you how many words to read: [2 b c].
Dyck word
Something with format [arity, dyck-word, *vars], e.g. [5, 0b00100111, =, parent, $y, Bob, $rhs]
Visually:
where 00100111 is the Dyck word (a.k.a. edge walk) of the above tree.
The fractal relation is not as clear here, but a subexpression is a subword of the Dyck word together with a subarray of the full expression.
Hadamard
If we want constant-time access of an array element and fast pattern matching, we can go a step further:
The hadamard code of the expression + indication of variables/symbols: [treemask, varmask, (vars - symbols)*, symbols*], e.g. [0b0000011111111111, 0b111110110000000, $t, $rhs, =, parent, Bob]
Visually:
- - - - - - - - + V + V + V + V + V + V + V +
$rhs
- S - S - S - S + + + +
=
- - + S + S
Bob
- S + V
parent $t
The fractal relation is unclear here, but it should be possible to parallel bit extract the masks and gather/compress the right words to efficiently build a subexpression.
Parallel Vector
TODO
Notes
Experimentation will lead the way here, and conversions between these formats are a great standalone task!
The text was updated successfully, but these errors were encountered:
While we hope S-expressions are going to be rare on themselves in the kernel, they will exist, and when they do, they're likely to be the bottleneck.
Therefore, let's not be naive about implementing them.
The representation of variables is (de Bruijn indices/levels VS scoped IDs VS globally unique IDs) orthogonal and will have its own issue.
The question about left or right associative representation is irrelevant to this discussion.
Implementation options
Traditional
What you'd expect modulo the pointer type chosen to be Box here.
Prolog-style
(= (parent $y Bob) $rhs)
is[2 =; 2 parent; 0 $y; 0 Bob; 0 $rhs]
Assuming arities are disjoint from vars, and symbols and vars are interchangeable:
(a (b c) (d e f g h) i)
becomes
[4 a 2 b c 5 d e f g h 1 i]
yielding
Expr = *u64
.This maintains the nice property that subexpressions are fractal (have the same type and are free).
It does not need allocations or pointer indirections.
E.g. The
(b c)
subexpression is just the base pointer plus two, and the word the pointer points to tells you how many words to read:[2 b c]
.Dyck word
Something with format
[arity, dyck-word, *vars]
, e.g.[5, 0b00100111, =, parent, $y, Bob, $rhs]
Visually:
where
00100111
is the Dyck word (a.k.a. edge walk) of the above tree.The fractal relation is not as clear here, but a subexpression is a subword of the Dyck word together with a subarray of the full expression.
Hadamard
If we want constant-time access of an array element and fast pattern matching, we can go a step further:
The hadamard code of the expression + indication of variables/symbols:
[treemask, varmask, (vars - symbols)*, symbols*]
, e.g.[0b0000011111111111, 0b111110110000000, $t, $rhs, =, parent, Bob]
Visually:
The fractal relation is unclear here, but it should be possible to parallel bit extract the masks and gather/compress the right words to efficiently build a subexpression.
Parallel Vector
TODO
Notes
Experimentation will lead the way here, and conversions between these formats are a great standalone task!
The text was updated successfully, but these errors were encountered: