Skip to content

Commit

Permalink
Merge pull request #127 from billhails/bugs
Browse files Browse the repository at this point in the history
fixed bug in scanner
  • Loading branch information
billhails authored Nov 2, 2024
2 parents 940d137 + 6aa9f10 commit 7540e8a
Show file tree
Hide file tree
Showing 4 changed files with 152 additions and 349 deletions.
90 changes: 48 additions & 42 deletions docs/V2.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,32 @@
# CEKF Version 2 - Bytecode

Benchmarks so far have been encouraging, the `fib(35)` test with `-O2` now
takes around 5.5 seconds, but `fib(40)` still takes around 54 seconds,
Benchmarks so far have been encouraging, the `fib(35)` test with `-O2`
now takes around 5.5 seconds, but `fib(40)` still takes around 54 seconds,
while Nystrom's stack-based bytecode interpreter can do that calculation
in around 5 seconds. Of course this is due to using environments instead of
a stack, and walking trees instead of stepping through bytecode.
in around 5 seconds. Of course this is due to using environments instead
of a stack, and walking trees instead of stepping through bytecode.

Another factor is that the entire AST needs to be protected, and must be
marked every time a garbage collection occurs.
Another factor is that the entire AST needs to be protected, and must
be marked every time a garbage collection occurs.

So how difficult would it be to convert the AST to bytecode and use
a local stack? It turns out to be not so hard. Of course we still need
So how difficult would it be to convert the AST to bytecode and use a
local stack? It turns out to be not so hard. Of course we still need
environments, because of closures that capture them, and it's quite
possible that version 2 will actually be **slower** initially, but
I'll discuss a possible version 2.1 later that I hope will fix that.
possible that version 2 will actually be **slower** initially, but I'll
discuss a possible version 2.1 later that I hope will fix that.

Anyway let's review the math from version 1 and see what the bytecode
equivalent might look like.

The new machine no longer has any $\mathcal{A}$, $applyproc$ or
$applykont$ functions, as they are subsumed into the general $step$ function.
However the basic structure and discussion is the same.
$applykont$ functions, as they are subsumed into the general $step$
function. However the basic structure and discussion is the same.

One big difference however is of course that since there is no longer an AST,
the C regiter is now an index into an array of bytecodes.
One big difference however is of course that since there is no longer
an AST, the C regiter is now an index into an array of bytecodes.

I'll present the original math for each step, then its bytecode equivalent.
I'll present the original math for each step, then its bytecode
equivalent.

> CAVEAT - NONE OF THIS IS TESTED YET. Please don't rush to implement and then blame me if
> it doesn't work, If/when I get it working I'll update this document.
Expand All @@ -37,11 +38,12 @@ I'll present the original math for each step, then its bytecode equivalent.
## Internal Byteodes

These first few bytecodes are the equivalent of the old $\mathcal{A}$ function,
the interpreter is stepping through the code encountering these expressions
and not changing the overall machine state, just the stack. It will continue to
iterate, incrementing the address pointer appropriately, until it hits a
state-changing bytecode. State changing bytecodes are discussed in a later section.
These first few bytecodes are the equivalent of the old $\mathcal{A}$
function, the interpreter is stepping through the code encountering
these expressions and not changing the overall machine state, just the
stack. It will continue to iterate, incrementing the address pointer
appropriately, until it hits a state-changing bytecode. State changing
bytecodes are discussed in a later section.

### Variables

Expand All @@ -59,16 +61,18 @@ bytecode for that:
`\| VAR \| frame \| offset \|` | `push(lookup(frame, offset, env))` |


The bytecode consiste of three bytes, a `VAR` tag that identifies that a variable is coming
up, then a byte for its frame and a byte for its offset in the frame (see [Lexical
Addressing](LEXICAL_ADDRESSING.md) for details if you haven't already.)
The bytecode consists of three bytes, a `VAR` tag that identifies that
a variable is coming up, then a byte for its frame and a byte for its
offset in the frame (see [Lexical Addressing](LEXICAL_ADDRESSING.md)
for details if you haven't already.)

On seeing that, the bytecode interpreter does the lookup, and pushes the result onto the stack.
On seeing that, the bytecode interpreter does the lookup, and pushes
the result onto the stack.

`env` here is the $\rho$ argument to the old $\mathcal{A}$ function.

Note that evaluating an `aexp` always has a stack cost of 1, e.g. there
is always one more element on the stack after evaluation an `aexp`.
is always one more element on the stack after evaluation of an `aexp`.

### Constants

Expand Down Expand Up @@ -102,13 +106,15 @@ $$

| bytecode | action |
|----------|--------|
| `\| LAM \| nvar \| addr(after exp) \| ..exp.. \|` | `push(clo(nvar, addr(exp), env);` |
| `\| LAM \| nvar \| addr(after exp) \| ..exp.. \|` | `push(clo(nvar, addr(exp), env));` |

Where `addr(exp)` is the index of the `exp` in the bytecode array.
Where `addr(exp)` is the index of the `exp` in the bytecode array, and
`addr(after exp)` tells the bytecode interpreter where to resume after
pushing the closure.

Note the absence of explicit variable names. Again because of
lexical addressing the only thing the closure needs to know is the size
of the environment.
Note the absence of explicit variable names. Again because of lexical
addressing the only thing the closure needs to know is the size of
the environment.

### Primitives

Expand Down Expand Up @@ -140,17 +146,17 @@ Consider a primitive sequence like `2 + 3 * 4`. That will parse to

```mermaid
flowchart TD
plus(+)
plus(plus)
plus ---- two(2)
plus --- times(×)
plus --- times(times)
times --- three(3)
times --- four(4)
```

There are various ways to print out that tree, for example for each
(non-terminal) node, printing the left hand branch, then printing the
operation, then printing the right hand branch would recover the infix
notation we started with. Howver if instead we print the left-hand
There are various ways to traverse that tree, for example for each
(non-terminal) node, visiting the left hand branch, then visiting the
operation, then visiting the right hand branch would recover the infix
notation we started with. However if instead we visit the left-hand
branch, then the right-hand branch, then the operation, we end up with
reverse polish notation: `2 3 4 * +` which is exactly the order we need
to evaluate the expressions:
Expand All @@ -161,18 +167,18 @@ to evaluate the expressions:
* pop the 3 and the 4, multiply them and push the result 12.
* pop the 2 and the 12, add them and push the result 14.

Note that the entire operation has a stack cost of 1, preserving
that invariant.
Note that the entire operation has a stack cost of 1, preserving that
invariant.

## State Changing Bytecodes

The rest of these situations change the overall state of the machine, corresponding to
$step$ returning a new state.
The rest of these situations change the overall state of the machine,
corresponding to $step$ returning a new state.

### Function calls

For function calls, `step` first evaluates the function,
then the arguments, then it applies the function:
For function calls, `step` first evaluates the function, then the
arguments, then it applies the function:

$$
step(\mathtt{(aexp_0\ aexp_1\dots aexp_n)}, \rho, \kappa, f) = applyproc(proc,\langle val_1,\dots val_n\rangle, \kappa, f)
Expand Down
2 changes: 1 addition & 1 deletion fn/wonderful-life.fn
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,4 @@ let
puts("}\n");
}
in
printTree(generateTree(0.640))
printTree(generateTree(0.64046))
1 change: 1 addition & 0 deletions src/pratt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ enums:
- START
- STR
- ESC
- ESCS
- UNI
- CHR1
- CHR2
Expand Down
Loading

0 comments on commit 7540e8a

Please sign in to comment.