Skip to content

Commit

Permalink
Merge pull request #324 from Chia-Network/document-back-references
Browse files Browse the repository at this point in the history
document CLVM back references
  • Loading branch information
BrandtH22 authored Sep 24, 2024
2 parents ea4ce0e + c63529f commit 23e9679
Showing 1 changed file with 59 additions and 0 deletions.
59 changes: 59 additions & 0 deletions docs/clvm.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,11 @@ The number of skipped bits is also the number of total bytes the size is encoded

The number of size bytes includes the first.


:::note

It is possible, although discouraged, to encode the length of the atom in more bytes than necessary to fit the number. i.e. have unnecessary leading zeroes. This is similar to [UTF-8 overlong encoding](https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings). It is not safe to compare CLVM programs in serialized form, since identical programs may compare not equal. To compare programs, use tree hash.

:::

### Cons Pairs
Expand All @@ -259,6 +264,60 @@ For example, `(1 . 2)` would be represented as `0xFF0102`. Once you read `0xFF`,

Lists are typically chains of cons pairs that end in a nil terminator.

### Back references

As of the hard fork at block height 5 496 000, CLVM serialization was extended with *back references*. This feature allows to refer back to previous CLVM structure, that should be duplicated in the deserialized output. This feature is also sometimes referred to as CLVM compression.

The compression comes from being able to collapse repeated structures. It only needs to be included once, and then referred back to every time it is repeated. This is especially helpful in a block generator where the same puzzle reveal may be included multiple times, for coins secured by the same puzzle. The curried parameters are not repeated, but the underlying puzzle is.

A back reference is introduced by a `0xFE` byte. This byte is followed by an atom that's interpreted as a *path*. The path points into a tree of previously parsed expressions (environment). The lookup works the same as into the CLVM execution [Environnment](#Environment).

CLVM trees are parsed bottom-up, left to right. As each atom is parsed, it is prepended to the environment. As each pair is parsed, it pops the top two values of the environment, forms a pair that is then prepended to the environment. Each back-reference performs a path lookup into the environment and prepends the resulting sub tree to the environment.

For example, the following buffer is a serialization of `("foobar" . ("foobar" . NIL))`, `ff86666f6f626172fe01`. It is parsed in the order described in the tree below:

```
[3]
/ \
1 2 (backref)
```

The environment is looks like this in each step:

1. parse atom "foobar"
```
[]
/ \
/ \
"foobar" NIL
```

2. parse back reference `01`
```
[]
/ \
/ \
/ \
/ \
/ \
[] []
/ \ / \
/ \ / \
"foobar" NIL "foobar" NIL
```

3. parse pair. pop top 2 items and form a pair
```
[]
/ \
/ \
/ \
"foobar" []
/ \
/ \
"foobar" NIL
```

## Programs as Parameters

CLVM does not have operators for defining and calling functions. However, it does allow programs to be passed into the environment, as well as executing a value as a program with a new environment.
Expand Down

0 comments on commit 23e9679

Please sign in to comment.