Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S-Expression representation #1

Open
Adam-Vandervorst opened this issue May 9, 2024 · 0 comments
Open

S-Expression representation #1

Adam-Vandervorst opened this issue May 9, 2024 · 0 comments
Labels
good first issue Good for newcomers

Comments

@Adam-Vandervorst
Copy link
Collaborator

While we hope S-expressions are going to be rare on themselves in the kernel, they will exist, and when they do, they're likely to be the bottleneck.
Therefore, let's not be naive about implementing them.
The representation of variables is (de Bruijn indices/levels VS scoped IDs VS globally unique IDs) orthogonal and will have its own issue.
The question about left or right associative representation is irrelevant to this discussion.

Implementation options

Traditional

pub enum Expr {
    Var(usize),
    App(Box<Expr>, Box<Expr>),
}

What you'd expect modulo the pointer type chosen to be Box here.

Prolog-style

(= (parent $y Bob) $rhs) is [2 =; 2 parent; 0 $y; 0 Bob; 0 $rhs]
Assuming arities are disjoint from vars, and symbols and vars are interchangeable:
(a (b c) (d e f g h) i)
becomes
[4 a 2 b c 5 d e f g h 1 i]
yielding Expr = *u64.

This maintains the nice property that subexpressions are fractal (have the same type and are free).
It does not need allocations or pointer indirections.
E.g. The (b c) subexpression is just the base pointer plus two, and the word the pointer points to tells you how many words to read: [2 b c].

Dyck word

Something with format [arity, dyck-word, *vars], e.g. [5, 0b00100111, =, parent, $y, Bob, $rhs]
Visually:

             .
        0  /   \
         / \ 1   \
    0  / 0 / \     \ 1
     /    .    \ 1   \
   /  0 /   \ 1  \     \
((= ((parent $y) Bob)) $rhs)

where 00100111 is the Dyck word (a.k.a. edge walk) of the above tree.

The fractal relation is not as clear here, but a subexpression is a subword of the Dyck word together with a subarray of the full expression.

Hadamard

If we want constant-time access of an array element and fast pattern matching, we can go a step further:
The hadamard code of the expression + indication of variables/symbols: [treemask, varmask, (vars - symbols)*, symbols*], e.g.
[0b0000011111111111, 0b111110110000000, $t, $rhs, =, parent, Bob]
Visually:

   
-   -   -   -   -   -   -   -   + V + V + V + V + V + V + V +   
                                               $rhs
- S - S - S - S +   +   +   +   
       =
                -   -   + S + S 
                           Bob
                - S + V 
              parent  $t

The fractal relation is unclear here, but it should be possible to parallel bit extract the masks and gather/compress the right words to efficiently build a subexpression.

Parallel Vector

TODO

Notes

Experimentation will lead the way here, and conversions between these formats are a great standalone task!

@Adam-Vandervorst Adam-Vandervorst added the good first issue Good for newcomers label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant