Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2/2] proof: allow proof courier to short cut with local archive #730

Merged
merged 19 commits into from
Feb 6, 2024

Conversation

guggero
Copy link
Member

@guggero guggero commented Dec 8, 2023

Depends on #726, only the last commit is new.

Fixes #575.
Fixes #362.

With this PR we allow the universe RPC courier to short cut the
iterative proof retrieval process if we arrive at a proof in the chain
that we already have in the local archive. And since the local archive
only stores full chains of proofs, once we get one from it, the full
retrieval process is complete.

@guggero guggero requested review from Roasbeef and ffranr December 8, 2023 16:02
proof/file.go Outdated Show resolved Hide resolved
tapdb/multiverse.go Outdated Show resolved Hide resolved
tapdb/multiverse.go Outdated Show resolved Hide resolved
tapdb/multiverse.go Outdated Show resolved Hide resolved
tapdb/multiverse.go Outdated Show resolved Hide resolved
tapdb/multiverse.go Outdated Show resolved Hide resolved
tapdb/multiverse.go Outdated Show resolved Hide resolved
tapdb/multiverse.go Outdated Show resolved Hide resolved
tapgarden/custodian.go Outdated Show resolved Hide resolved
proof/archive.go Outdated Show resolved Hide resolved
@guggero guggero force-pushed the proof-courier-shortcut branch 2 times, most recently from 34d139a to 7299272 Compare December 13, 2023 18:10
@guggero
Copy link
Member Author

guggero commented Dec 13, 2023

Thanks a lot for the reviews, @ffranr! Given this PR only has a single commit on top of #726, I assume you'll approve that one indirectly once you approve this one?

@ffranr
Copy link
Contributor

ffranr commented Dec 13, 2023

Thanks a lot for the reviews, @ffranr! Given this PR only has a single commit on top of #726, I assume you'll approve that one indirectly once you approve this one?

I didn't realise that was the case. I think this is a good argument for basing PR's on top of one another rather than main. You mentioned in the PR body, I see now ("Depends on #726, only the last commit is new."). But right now my review isn't in the same place as Laolu's review of #726.

@jharveyb jharveyb self-requested a review December 14, 2023 00:45
Copy link
Contributor

@jharveyb jharveyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, would be neat to have a test with the receiver where part of the proofs (issuance?) are already local.

proof/courier.go Show resolved Hide resolved
@guggero guggero changed the base branch from main to custodian-local-universe December 14, 2023 08:39
@guggero guggero changed the title proof: allow proof courier to short cut with local archive [2/2] proof: allow proof courier to short cut with local archive Dec 14, 2023
@guggero
Copy link
Member Author

guggero commented Dec 14, 2023

I think this is a good argument for basing PR's on top of one another rather than main.

Okay, I did that now (and also added the [2/2] title prefix). But I think that using this model will lead to more accidental premature merges because most of the team is used to the other model where the dependencies are outlined in the PR body instead of the base branch (which IMO is very easy to miss).

@jharveyb
Copy link
Contributor

Was thinking that someone could use 'request changes' to block the PR, but the author can't do that, only reviewers.

@guggero guggero force-pushed the custodian-local-universe branch 2 times, most recently from 32fe75c to dc08080 Compare December 20, 2023 15:31
@guggero guggero force-pushed the proof-courier-shortcut branch from 7299272 to cf3dfd5 Compare December 20, 2023 15:35
@guggero guggero requested a review from jharveyb December 20, 2023 15:36
Copy link
Contributor

@jharveyb jharveyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, excited for tap-over-avian-carrier 👍🏽

tapdb/multiverse.go Show resolved Hide resolved
proof/courier.go Show resolved Hide resolved
tapcfg/server.go Outdated Show resolved Hide resolved
@guggero guggero force-pushed the custodian-local-universe branch from dc08080 to 6cc7b2a Compare January 3, 2024 12:39
@guggero guggero force-pushed the proof-courier-shortcut branch from cf3dfd5 to 915c091 Compare January 3, 2024 12:46
@guggero guggero force-pushed the custodian-local-universe branch from 6cc7b2a to 864b554 Compare January 9, 2024 07:44
@guggero guggero force-pushed the proof-courier-shortcut branch from 915c091 to 45a8a65 Compare January 9, 2024 08:02
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only thing this is missing is a unit test to exercise the short circuit behavior. I think we could do an itest as well, where that would assert than N queries were made to the remote universe server, instead of N+M or w/e, where N is the set of new state transitions, and M is that common prefix.

proof/courier.go Show resolved Hide resolved
Base automatically changed from custodian-local-universe to main January 12, 2024 08:35
@guggero guggero force-pushed the proof-courier-shortcut branch from 45a8a65 to ebef082 Compare January 16, 2024 18:24
@guggero
Copy link
Member Author

guggero commented Jan 16, 2024

I found the issue with the integration test failing. Turns out this is the first time it actually bit us that the way we store and query proofs (with script key only) fails if we receive to the same address multiple times. Because then we overwrite an existing proof (at least on the disk where the file name and path only consists of the asset ID and script key).
And when querying a proof, we only look at the script key as well.

So we need to re-model our disk and DB proof storage and query to also consider the outpoint.
I'll work on that on the side.

@lightninglabs-deploy
Copy link

@guggero, remember to re-request review from reviewers when ready

@guggero guggero force-pushed the proof-courier-shortcut branch from 88b0017 to 5b291ba Compare February 2, 2024 16:24
@guggero
Copy link
Member Author

guggero commented Feb 2, 2024

I finally addressed all TODOs 🎉

I also tested the file migration. Though I forgot that in our testnet/prod tapd installation we only run a universe server, which doesn't store proof files (everything is in the multiverse database only).

So I instead went ahead and used the load test binary locally, which created 200 assets and sent a couple of them to a second node. After doing that on main I restarted the nodes on this branch and observed the log:

First node:

2024-02-02 17:18:50.339 [INF] PROF: Found 263 proof files in /home/guggero/.tapd/data/regtest/proofs with old naming scheme, renaming now (will take a while)
2024-02-02 17:18:50.427 [INF] PROF: Done renaming  263 proof files, took 88.173316ms

Second node:

2024-02-02 17:20:28.219 [INF] PROF: Found 63 proof files in /home/guggero/.tapd-bob/data/regtest/proofs with old naming scheme, renaming now (will take a while)
2024-02-02 17:20:28.239 [INF] PROF: Done renaming  63 proof files, took 20.181639ms

I then made sure further assets could be sent successfully, which worked as expected.

@guggero guggero requested review from jharveyb and Roasbeef February 2, 2024 16:24
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ⛷️

tapdb/assets_store.go Outdated Show resolved Hide resolved
tapdb/sqlc/queries/assets.sql Show resolved Hide resolved
proof/archive.go Show resolved Hide resolved
proof/archive_test.go Show resolved Hide resolved
proof/archive.go Show resolved Hide resolved
@@ -1275,10 +1275,26 @@ func (a *AssetStore) FetchProofs(ctx context.Context,
"script key: %w", err)
}

f := proof.File{}
err = f.Decode(bytes.NewReader(dbRow.ProofFile))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we could do to optimize stuff like this would be to have the outpoint he in a predictable location at the end of a file. Not crucial now, just another idea re how we can make the proof files more usable.

@@ -744,14 +744,13 @@ func (p *ChainPorter) transferReceiverProof(pkg *sendPackage) error {
"proof file: %w", err)
}

// Hash proof locator.
hash, err := proofLocator.Hash()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So before for proofs with the same asset ID + script key, we would end up clobbering them in this map, causing us to not write all relevant data to disk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though since these were just the passive assets within the same commitment, the script keys were forced to be unique, so no clobbering should have happened.

The actual "hack" this commit is cleaning up is that we use the hash of the Locator struct that isn't fully populated (only 2 of the 4 members). And then when accessing the map again further down in the call stack, we need to make sure we only fill the same 2 members to arrive at the same hash, otherwise we get a mismatch.
It was good for performance, but bad for understanding what's going on. When I looked for all instances of the Locator struct and set the OutPoint everywhere, this just broke.
IMO the new way of structuring the proofs has a slight performance hit but has way less chance of us screwing things up by accident.

Also, I'm planning on eliminating the "passive assets" concept soon anyway (in a future refactor PR spun out from the learnings of the TAP channel prototype), so this should be gone eventually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm planning on eliminating the "passive assets" concept soon anyway (in a future refactor PR spun out from the learnings of the TAP channel prototype), so this should be gone eventually.

Interesting, perhaps make a tracking issue to get more cross-pollination of the idea? Is this related to the SIGHASH_NOINPUT ideas, or other?

proof/archive.go Show resolved Hide resolved
AssetID: assetID,
PrevOutPoint: prevOutpoint,
PrevAssetID: prevID.ID[:],
PrevScriptKey: prevID.ScriptKey.CopyBytes(),
WitnessStack: witnessStack,
SplitCommitmentProof: splitCommitmentProof,
WitnessIndex: int32(idx),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so we were missing this all along? Or we were relying on implicit read order here to keep the expected serialization the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or seems the assumption is that no multi-input spends exist today in the DB?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we were lucky and the implicit read order was correct so far. And we only have one itest that actually does use multiple inputs.

-- The witness index indicates the order of the witness in the list of witnesses
-- for a given asset. We didn't really support more than one witness before, so
-- the default value of 0 should be fine for all existing assets.
ALTER TABLE asset_witnesses ADD COLUMN witness_index INTEGER NOT NULL DEFAULT 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about existing entries that don't have the witness_index field?

Copy link
Member Author

@guggero guggero Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was getting at with the comment. Unless a user manually used the vPSBT APIs to create a virtual transaction with multiple inputs, this shouldn't really be used in the wild.

The alternative would be to add a Golang based migration that selects the witnesses using the natural order and then re-persists that as the witness_index.

With this commit we allow the universe RPC courier to short cut the
iterative proof retrieval process if we arrive at a proof in the chain
that we already have in the local archive. And since the local archive
only stores full chains of proofs, once we get one from it, the full
retrieval process is complete.
With this commit we make sure that proofs in the database don't collide
if they use the same script key but different anchor transactions.
This situation can occur if a TAP address receives multiple transfers.
We also make sure we can query for the correct proof if we also specify
the outpoint in the query.
Since we now have a JOIN with the managed_utxos in the UpsertAssetProof
we cannot use that to update proofs that haven't been re-anchored yet.
But because we re-anchor passive assets only _after_ updating the
proofs, this would previously result in no rows being updated.
We could change the order of operations instead, but having WHERE clause
for a specific database ID mixed in with optional value based queries
wasn't super beautiful in the first place. So we opt for having a more
explicit upsert for database ID based proof identification.
This is a simple code move commit that removes a parsing function from the
integration test and uses a commonly available one instead.
To make sure we don't accidentally overwrite a proof file if we receive
to a TAP address multiple times, we also use the first couple of bytes
of the outpoint TXID and index in the file name.
We don't use the full outpoint as in some operating systems the full
path for a file is not allowed to exceed 256 characters (the path and
file name combined). And since we already use 130 characters for the hex
encoded asset ID and script key, we need to shorten the outpoint
somewhat.
We will add a migration that renames existing files on disk in the next
commit.
With this commit we make sure that whenever we start the file archive we
migrate any proof files that use the old name to the new naming scheme.
Since we now require the proof outpoint to be specified in order to
fetch the correct proof, we make sure we supply that part of the proof
locator in all situations. We also make sure we specify the outpoint
when storing proofs.
To get a unique lookup key we mis-used a proof locator's hash as the key
to look up passive asset proofs quickly without needing to scan a slice.
Because we don't have the same data available when creating the map as
we do when accessing it further down in the asset database, we got a
mismatch in the hash and proofs couldn't be found (mainly due to not
having access to the previous outpoint of the passive assets being
re-anchored).
As a compromise we now map the proofs by asset ID and have a slice of
proofs in case there are multiple passive assets with the same asset ID.
This is better than having a completely flat slice of proofs (as we
don't have to scan through all of them) but still requires us to do
_some_ iterating.
This commit allows the ExportProof and ProveAssetOwnership RPC methods
to be called with an outpoint as well to disambiguate in case of
multiple proofs with the same script key (e.g. multiple receives to the
same TAP address).
Because we can now end up importing proofs that we already have (for
example when sending to our own TAP address using a universe courier, we
will pull the proof from the local universe and import it into our store
again).
Before turning this into an upsert, we would end up with an asset that
had two identical entries in the previous witness list.
To make sure the order of multiple witnesses is kept, we also need to
add a witness_index field that we can use for sorting.
Unfortunately that breaks the data migration demo test as the
FetchAllAssets query also loads the witness and the query for that uses
the witness_index field that doesn't exist at that point.
Since the current test is only a demo we just change it to fetch the
(unchanged) managed UTXOs instead.
When starting the custodian and we get an error, we actually want to
inspect the error. Otherwise we just get a timeout when listening for
the subscription signals which doesn't really tell us what's wrong.
This helped us debug an issue with a unit test in the previous commit.
This fixes a couple of instances where the porter used the wrong
outpoint for proof locators when creating new proofs for a send package.
This was previously not noticed because the outpoint in the locator was
ignored by both the file based and database archive. With the outpoint
now being mandatory, this lead to failures in the integration tests.
This fixes a bug reported by a user running v0.3.3-rc1. Although the
situation can only happen if the daemon is shut down in exactly the
wrong moment (or, more likely, due to an otherwise inconsistent database
state), it can happen that the multiverse reports a proof is available
but it wasn't yet imported into the local archive.
To make sure we can definitely rely on the proof being in the asset DB
when trying to complete a receive event, we double check and re-import
the proof if necessary.
@guggero guggero force-pushed the proof-courier-shortcut branch from 5b291ba to e532f2c Compare February 5, 2024 11:57
Copy link
Contributor

@jharveyb jharveyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! New tests are solid, great to see all the fixes 🚀

ctxt, headerVerifier, c.cfg.GroupVerifier, false, p,
)
if err != nil {
log.Errorf("ERROOOORRR: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we leaving this in as-is? 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can push a commit to resolve on our timezone before merging.

@Roasbeef Roasbeef merged commit 99b8f86 into main Feb 6, 2024
14 checks passed
@guggero guggero deleted the proof-courier-shortcut branch February 6, 2024 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
5 participants