Skip to content

Commit

Permalink
Update storage design docs to include the publish step (#391)
Browse files Browse the repository at this point in the history
  • Loading branch information
AlCutter authored Dec 5, 2024
1 parent 6883819 commit 4b60f59
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 30 deletions.
36 changes: 15 additions & 21 deletions storage/aws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,34 +29,28 @@ A table with a single row which is used to keep track of the next assignable seq
This holds batches of entries keyed by the sequence number assigned to the first entry in the batch.

### `IntCoord`
TODO: add the new checkpoint updater logic, and update the docstring in aws.go.

This table is used to coordinate integration of sequenced batches in the `Seq` table.
This table is used to coordinate integration of sequenced batches in the `Seq` table, and keep track of the current tree state.

## Life of a leaf

TODO: add the new checkpoint updater logic.

1. Leaves are submitted by the binary built using Tessera via a call the storage's `Add` func.
2. [Not implemented yet - Dupe squashing: look for existing `<identity_hash>` object, read assigned sequence number if present and return.]
3. The storage library batches these entries up, and, after a configurable period of time has elapsed
1. The storage library batches these entries up, and, after a configurable period of time has elapsed
or the batch reaches a configurable size threshold, the batch is written to the `Seq` table which effectively
assigns a sequence numbers to the entries using the following algorithm:
In a transaction:
1. selects next from `SeqCoord` with for update ← this blocks other FE from writing their pools, but only for a short duration.
2. Inserts batch of entries into `Seq` with key `SeqCoord.next`
3. Update `SeqCoord` with `next+=len(batch)`
4. Integrators periodically integrate new sequenced entries into the tree:
1. Inserts batch of entries into `Seq` with key `SeqCoord.next`
1. Update `SeqCoord` with `next+=len(batch)`
1. Integrators periodically integrate new sequenced entries into the tree:
In a transaction:
1. select `seq` from `IntCoord` with for update ← this blocks other integrators from proceeding.
2. Select one or more consecutive batches from `Seq` for update, starting at `IntCoord.seq`
3. Write leaf bundles to S3 using batched entries
4. Integrate in Merkle tree and write tiles to S3
5. Update checkpoint in S3
6. Delete consumed batches from `Seq`
7. Update `IntCoord` with `seq+=num_entries_integrated`
8. [Not implemented yet - Dupe detection:
1. Writes out `<identity_hash>` containing the leaf's sequence number]
1. Select one or more consecutive batches from `Seq` for update, starting at `IntCoord.seq`
1. Write leaf bundles to S3 using batched entries
1. Integrate in Merkle tree and write tiles to S3
1. Update checkpoint in S3
1. Delete consumed batches from `Seq`
1. Update `IntCoord` with `seq+=num_entries_integrated` and the latest `rootHash`
1. Checkpoints representing the latest state of the tree are published at the configured interval.

## Dedup

Expand All @@ -75,12 +69,12 @@ operational overhead, code complexity, and so was selected.

The alpha implementation was tested with entries of size 1KB each, at a write
rate of 1500/s. This was done using the smallest possible Aurora instance
availalbe, `db.r5.large`, running `8.0.mysql_aurora.3.05.2`.
available, `db.r5.large`, running `8.0.mysql_aurora.3.05.2`.

Aurora (Serverless v2) worked out well, but seems less cost effective than
provisionned Aurora for sustained traffic. For now, we decided not to explore this option further.
provisioned Aurora for sustained traffic. For now, we decided not to explore this option further.

RDS (MySQL) worked out well, but requires more admistrative overhead than
RDS (MySQL) worked out well, but requires more administrative overhead than
Aurora. For now, we decided not to explore this option further.

DynamoDB worked out to be less cost efficient than Aurora and RDS. It also has
Expand Down
7 changes: 2 additions & 5 deletions storage/gcp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ This table is used to coordinate integration of sequenced batches in the `Seq` t
## Life of a leaf

1. Leaves are submitted by the binary built using Tessera via a call the storage's `Add` func.
1. Dupe squashing (TODO): look for existing `<identity_hash>` object, read assigned sequence number if present and return.
1. The storage library batches these entries up, and, after a configurable period of time has elapsed
or the batch reaches a configurable size threshold, the batch is written to the `Seq` table which effectively
assigns a sequence numbers to the entries using the following algorithm:
Expand All @@ -48,11 +47,9 @@ This table is used to coordinate integration of sequenced batches in the `Seq` t
1. Select one or more consecutive batches from `Seq` for update, starting at `IntCoord.seq`
1. Write leaf bundles to GCS using batched entries
1. Integrate in Merkle tree and write tiles to GCS
1. Update checkpoint in GCS
1. Delete consumed batches from `Seq`
1. Update `IntCoord` with `seq+=num_entries_integrated`
1. Dupe detection (TODO):
1. Writes out `<identity_hash>` containing the leaf's sequence number
1. Update `IntCoord` with `seq+=num_entries_integrated` and the latest `rootHash`
1. Checkpoints representing the latest state of the tree are published at the configured interval.

## Dedup

Expand Down
13 changes: 9 additions & 4 deletions storage/mysql/DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,11 @@ The DB layout has been designed such that serving any read request is a point lo

#### `Checkpoint`

A single row that records the current state of the log. Updated after every sequence + integration.
A single row that records the current published checkpoint.

#### `TreeState`

A single row that records the current state of the tree. Updated after every integration.

#### `Subtree`

Expand Down Expand Up @@ -51,12 +55,13 @@ Sequence pool:
Sequence & integrate (DB integration starts here):

1. Takes a batch of entries to sequence and integrate
1. Starts a transaction, which first takes a write lock on the checkpoint row to ensure that:
1. Starts a transaction, which first takes a write lock on the `TreeState` row to ensure that:
1. No other processes will be competing with this work.
1. That the next index to sequence is known (this is the same as the current checkpoint size)
1. That the next index to sequence is known (this is the same as the current tree size)
1. Update the required TiledLeaves rows
1. Perform an integration operation to update the Merkle tree, updating/adding Subtree rows as needed, and eventually updating the Checkpoint row
1. Perform an integration operation to update the Merkle tree, updating/adding Subtree rows as needed, and eventually updating the `TreeState` row
1. Commit the transaction
1. Checkpoints representing the latest state of the tree are published at the configured interval.

## Costs

Expand Down

0 comments on commit 4b60f59

Please sign in to comment.