Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec migration #796

Merged
merged 23 commits into from
Mar 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b2a850a
Initial specs migration. (#793)
adlerjohn Sep 27, 2022
ddb9d6a
Fix markdown lint errors. (#798)
adlerjohn Sep 27, 2022
dcf6d4a
Add deploy step to mdbook workflow. (#797)
adlerjohn Sep 27, 2022
dac49d1
Run table formatter. (#799)
adlerjohn Sep 27, 2022
60f341a
Add install prereq. (#800)
adlerjohn Sep 27, 2022
2cc65da
chore: specs-staging CODEOWNERS (#812)
rootulp Sep 29, 2022
eb3791e
docs: describe message length varint (#806)
rootulp Oct 19, 2022
26f11cb
specs: universal share prefix (#856)
rootulp Oct 31, 2022
3206843
specs: two reserved bytes (#939)
rootulp Nov 24, 2022
b610a0b
chore: uint32 for message size (#1182)
rootulp Jan 6, 2023
351e2d3
specs: fixed sequence length (#1124)
rootulp Jan 11, 2023
41c0859
specs: non-interactive default rules for reduced padding (#1156)
rootulp Jan 12, 2023
3c66c2d
docs: spec namespaced padding share (#1248)
rootulp Jan 17, 2023
4f83347
docs: spec tail padding share (#1244)
rootulp Jan 18, 2023
0dc7d17
docs: revise specs for padding shares (#1359)
rootulp Feb 8, 2023
329dc50
chore: specifies minimum square size in non-interactive default rules…
staheri14 Feb 23, 2023
1547891
docs: replace evidence with PFBs in square diagram (#1436)
rootulp Mar 3, 2023
c09843d
specs: auto generate table of contents (#1441)
rootulp Mar 6, 2023
cb2ccdf
docs: note about Q3 (#1467)
rootulp Mar 10, 2023
309e4e2
Merge branch 'main' into specs-staging
rootulp Mar 14, 2023
3ff81e8
lint: fix markdownlint
rootulp Mar 14, 2023
b439e04
fix: attempt to resolve markdownlint errors
rootulp Mar 15, 2023
f2e2e18
Update .github/workflows/gh-pages.yml
rootulp Mar 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
# directory owners
# NOTE: the directory owners should include the global owners unless the global
# owner is fully deferring ownership to the directory owner
docs @liamsi @adlerjohn @MSevey @evan-forbes
docs @liamsi @adlerjohn @MSevey @evan-forbes @rootulp
specs @liamsi @adlerjohn @MSevey @evan-forbes @rootulp
x/qgb @SweeXordious @evan-forbes
pkg/shares @rootulp @evan-forbes
pkg/wrapper @staheri14 @evan-forbes
29 changes: 29 additions & 0 deletions .github/workflows/gh-pages.yml
rootulp marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: github pages

on:
push:
branches:
- main
pull_request:

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- name: Setup mdBook
uses: peaceiris/actions-mdbook@v1
with:
mdbook-version: "0.4.21"

- name: Build book
run: mdbook build specs

- name: Deploy main
if: github.event_name == 'push'
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./specs/book
destination_dir: main
1 change: 1 addition & 0 deletions specs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
book
29 changes: 29 additions & 0 deletions specs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Celestia App Specifications

## Building From Source

Install [mdbook](https://rust-lang.github.io/mdBook/guide/installation.html) and [mdbook-toc](https://github.com/badboy/mdbook-toc):

```sh
cargo install mdbook
cargo install mdbook-toc
```

To build book:

```sh
mdbook build
```

To serve locally:

```sh
mdbook serve
```

## Contributing

Markdown files must conform to [GitHub Flavored Markdown](https://github.github.com/gfm/). Markdown must be formatted with:

- [markdownlint](https://github.com/DavidAnson/markdownlint)
- [Markdown Table Prettifier](https://github.com/darkriszty/MarkdownTablePrettify-VSCodeExt)
16 changes: 16 additions & 0 deletions specs/book.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[book]
authors = ["Celestia Labs"]
language = "en"
multilingual = false
src = "src"
title = "Celestia App Specifications"

[output.html]
git-repository-url = "https://github.com/celestiaorg/celestia-app"

[rust]
edition = "2021"

[preprocessor.toc]
command = "mdbook-toc"
renderer = ["html"]
9 changes: 9 additions & 0 deletions specs/src/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Celestia App Specifications

- [Specification](./specs/index.md)
- [Data Structures](./specs/data_structures.md)
- [Consensus](./specs/consensus.md)
- [Block Proposer](./specs/block_proposer.md)
- [Networking](./specs/networking.md)
- [Rationale](./rationale/index.md)
- [Message Layout](./rationale/message_block_layout.md)
11 changes: 11 additions & 0 deletions specs/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Summary

[Celestia App Specifications](./README.md)

- [Specification](./specs/index.md)
rootulp marked this conversation as resolved.
Show resolved Hide resolved
- [Data Structures](./specs/data_structures.md)
- [Consensus](./specs/consensus.md)
- [Block Proposer](./specs/block_proposer.md)
- [Networking](./specs/networking.md)
- [Rationale](./rationale/index.md)
- [Message Layout](./rationale/message_block_layout.md)
3 changes: 3 additions & 0 deletions specs/src/rationale/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Rationale

- [Message Layout](./message_block_layout.md)
55 changes: 55 additions & 0 deletions specs/src/rationale/message_block_layout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Message Layout

<!-- toc -->

## Preamble

Celestia uses [a data availability scheme](https://arxiv.org/abs/1809.09044) that allows nodes to determine whether a block's data was published without downloading the whole block. The core of this scheme is arranging data in a two-dimensional matrix then applying erasure coding to each row and column. This document describes the rationale for how data—transactions, messages, and other data—[is actually arranged](../specs/data_structures.md#arranging-available-data-into-shares). Familiarity with the [originally proposed data layout format](https://arxiv.org/abs/1809.09044) is assumed.

## Message Layout Rationale

Block data consists of:

1. Cosmos SDK module transactions (e.g. [MsgSend](https://github.com/cosmos/cosmos-sdk/blob/f71df80e93bffbf7ce5fbd519c6154a2ee9f991b/proto/cosmos/bank/v1beta1/tx.proto#L21-L32)). These modify the Celestia chain's state.
1. Celestia-specific transactions (e.g. [PayForBlobs](../specs/data_structures.md#payforblobdata)). These modify the Celestia chain's state.
1. Intermediate state roots: required for fraud proofs of the aforementioned transactions.
1. Messages: binary blobs which do not modify the Celestia state, but which are intended for a Celestia application identified with a provided namespace ID.

We want to arrange this data into a `k * k` matrix of fixed-sized shares, which will later be committed to in [Namespace Merkle Trees (NMTs)](../specs/data_structures.md#namespace-merkle-tree).

The simplest way we can imagine arranging block data is to simply serialize it all in no particular order, split it into fixed-sized shares, then arrange those shares into the `k * k` matrix in row-major order. However, this naive scheme can be improved in a number of ways, described below.

First, we impose some ground rules:

1. Data must be ordered by namespace ID. This makes queries into a NMT commitment of that data more efficient.
1. Since non-message data are not naturally intended for particular namespaces, we assign reserved namespaces for them. A range of namespaces is reserved for this purpose, starting from the lowest possible namespace ID.
1. By construction, the above two rules mean that non-message data always precedes message data in the row-major matrix, even when considering single rows or columns.
1. Data with different namespaces must not be in the same share. This might cause a small amount of wasted block space, but makes the NMT easier to reason about in general since leaves are guaranteed to belong to a single namespace.

Transactions can pay fees for a message to be included in the same block as the transaction itself. However, we do not want serialized transactions to include the entire message they pay for (which is the case in other blockchains with native execution, e.g. calldata in Ethereum transactions or OP_RETURN data in Bitcoin transactions), otherwise every node that validates the sanctity of the Celestia coin would need to download all message data. Transactions must therefore only include a commitment to (i.e. some hash of) the message they pay fees for. If implemented naively (e.g. with a simple hash of the message, or a simple binary Merkle tree root of the message), this can lead to a data availability problem, as there are no guarantees that the data behind these commitments is actually part of the block data.

To that end, we impose some additional rules onto _messages only_: messages must be placed is a way such that both the transaction sender and the block producer can be held accountable—a necessary property for e.g. fee burning. Accountable in this context means that

1. The transaction sender must pay sufficient fees for message inclusion.
1. The block proposer cannot claim that a message was included when it was not (which implies that a transaction and the message it pays for must be included in the same block).

Specifically, messages must begin at a new share, unlike non-message data which can span multiple shares. We note a nice property from this rule: if the transaction sender knows 1) `k`, the size of the matrix, 2) the starting location of their message in a row, and 3) the length of the message (they know this since they are sending the message), then they can actually compute a sequence of roots to _subtrees in the row NMTs_. More importantly, anyone can compute this, and can compute _the simple Merkle root of these subtree roots_.

This, however, requires the block producer to interact with the transaction sender to provide them the starting location of their message. This can be done selectively, but is not ideal as a default for e.g. end-user wallets.

### Non-Interactive Default Rules

As a non-consensus-critical default, we can impose one additional rule on message placement to make the possible starting locations of messages sufficiently predictable and constrained such that users can deterministically compute subtree roots without interaction:

> Messages start at an index that is a multiple of the message minimum square size. The message minimum square size is the smallest square that can contain the message in isolation (i.e. a square with only this message and no other transactions or messages).

In the constraint mentioned above, the number of rows/columns in the minimum square size should be a power of 2.
With the above constraint, we can compute subtree roots deterministically. In order to compute the subtree roots, split the message into chunks that are of maximum size: message minimum square size. As an example, a message of length `11` has a minimum square size of `4` because `11` is not greater than `4 * 4 = 16` total shares. Split the message into chunks of length `4, 4, 2, 1`. The resulting slices are the leaves of subtrees whose roots can be computed. These subtree roots will be present as internal nodes in the NMT of _some_ row(s).

This is similar to [Merkle Mountain Ranges](https://www.usenix.org/legacy/event/sec09/tech/full_papers/crosby.pdf), though with the largest subtree bounded by the message minimum square size rather than being unbounded.

The last piece of the puzzle is determining _which_ row the message is placed at (or, more specifically, the starting location). This is needed to keep the block producer accountable. To this end, the block producer simply augments each fee-paying transaction with some metadata: the starting location of the message the transaction pays for.

### Caveats

The message placement rules described above conflict with the first rule that shares must be ordered by namespace ID, as shares between two messages that are not placed adjacent to each other do not have a natural namespace they belong to. This is resolved by requiring that such shares have a value of zero and a namespace ID equal to the preceding message's. Since their value is known, they can be omitted from NMT proofs of all shares of a given namespace ID.
25 changes: 25 additions & 0 deletions specs/src/specs/block_proposer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Honest Block Proposer

<!-- toc -->

This document describes the tasks of an honest block proposer to assemble a new block. Performing these actions is not enforced by the [consensus rules](./consensus.md), so long as a valid block is produced.

## Deciding on a Block Size

Before [arranging available data into shares](./data_structures.md#arranging-available-data-into-shares), the size of the original data's square must be determined.

There are two restrictions on the original data's square size:

1. It must be at most [`AVAILABLE_DATA_ORIGINAL_SQUARE_MAX`](./consensus.md#constants).
1. It must be a power of 2.

With these restrictions in mind, the block proposer performs the following actions:

1. Collect as many transactions and messages from the mempool as possible, such that the total number of shares is at most [`AVAILABLE_DATA_ORIGINAL_SQUARE_MAX`](./consensus.md#constants).
1. Compute the smallest square size that is a power of 2 that can fit the number of shares.
1. Attempt to [lay out the collected transactions and messages](#laying-out-transactions-and-messages) in the current square.
1. If the square is too small to fit all transactions and messages (which may happen [due to needing to insert padding between messages](../rationale/message_block_layout.md)) and the square size is smaller than [`AVAILABLE_DATA_ORIGINAL_SQUARE_MAX`](./consensus.md#constants), double the size of the square and repeat the above step.

Note: the maximum padding shares between messages should be at most twice the number of message shares. Doubling the square size (i.e. quadrupling the number of shares in the square) should thus only have to happen at most once.

## Laying out Transactions and Messages
Loading