Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make script decoding non-recursive #66

Merged
merged 1 commit into from
Sep 22, 2021

Conversation

meshcollider
Copy link
Contributor

Implements a non-recursive Decode algorithm.

Obviously this is not sipa's ideal generic solution but I just thought I'd open this up as a start (and potentially for comparison).

Similar to the rust implementation here:
https://github.com/rust-bitcoin/rust-miniscript/blob/master/src/miniscript/decode.rs

@meshcollider meshcollider force-pushed the 202108_nonrecursive_decode_alone branch from 2951ea9 to 98f7100 Compare August 26, 2021 12:59
@meshcollider
Copy link
Contributor Author

As a follow-up I have also made a similar change to Parse() here: https://github.com/meshcollider/miniscript/tree/202108_nonrecursive_parse

But will wait for feedback on the approach with Decode before opening a PR for Parse too.

@sipa
Copy link
Owner

sipa commented Aug 30, 2021

Concept ACK.

I'd want to see tests that verify this actually works correctly for ~all inputs.

I may still have a gramtropy grammar for scripts lying around; with that it shouldn't be too hard to construct a test set and verify the old and new code parse everything identically.

@meshcollider
Copy link
Contributor Author

meshcollider commented Aug 31, 2021

@sipa
I have been doing some testing with the random miniscript generator and there are a few edge-cases where the decoding is not exactly identical to the input.
For example:

  • A number of wrappers will move inside an and_v expression. For example, c:and_v(X, Y) becomes [X] [Y] CHECKSIG which is then decoded as and_v(X, c:Y) (turning pk_k into c:pk). The same happens with the n: wrapper.
  • vtv:pk(key) becomes KEY CHECKSIG VERIFY 1 VERIFY which is then decoded as and_v(v:pk(key), v:1) (essentially a special case of the above due to the moving v: wrapper)
  • There is inherent ambiguity in [X] [Y] [Z] of course, whether it is and_v(X, and_v(Y, Z)) or and_v(and_v(X, Y), Z)

I think these cases are fine because obviously there is inherent, unavoidable ambiguity in certain script decoding, but not sure how to deal with them in the random tests generation.
However, some cases that don't work arise because of invalid types. For example:

  • and_v(X, and_b(Y, st:Z)) gives [X] [Y] SWAP [Z] 1 BOOLAND which decodes as and_v(and_v(X, Y), s:and_b(Z, 1))

Obviously 1 is not W-type so this is not a valid thing to do. I haven't tested but the rust implementation looks like it will have the same issue (or simply error).
I'll work on a fix for this.

@meshcollider
Copy link
Contributor Author

Second commit fixes the badly typed miniscript issues, so all tests in miniscript_tests.cpp pass

@sanket1729
Copy link
Contributor

sanket1729 commented Sep 5, 2021

I have tested that ~18000 test vectors (originally from Dmitri) rust-miniscript give the same output in all three cases.

  1. The recursive c++ implementation in master
  2. The non-recursive one in this PR
  3. rust-miniscript implementation

Test branch for reference: https://github.com/sanket1729/bitcoin/tree/2021_09_decode_tests_rust
This takes time to compile because I directly pasted the tests into a file instead of reading from a file. I did not bother fixing it because I only wanted to check it as a PoC.

I just did a code review by comparison with rust-miniscript (without looking if the code itself). Will do another review round tomorrow.

@meshcollider
Copy link
Contributor Author

Thanks Sanket! Appreciate your review

Copy link
Owner

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just started reviewing in detail. Feel free to address once I've gone through more.

bitcoin/script/miniscript.h Show resolved Hide resolved
bitcoin/script/miniscript.h Outdated Show resolved Hide resolved
bitcoin/script/miniscript.h Outdated Show resolved Hide resolved
@sanket1729
Copy link
Contributor

@sipa @meshcollider, while implementing non-recursive parsing for rust-miniscript I realized that this would require changing all of the library functions to non-recursive ones.

For c++ implementation, this means(right now) we only need to change the satisfy/ProduceInputHelper function to a non-recursive one. But eventually, if we want to support any sort of semantic analysis on miniscript, that would also need to be non-recursive.

I think the effort for supporting these is not worth the time.

  • The earlier recursive code is much cleaner to review, easy to maintain and build upon.
  • It's unclear if we are ruling any potential use-cases. Ruling out miniscripts over depth X(or 402 as it is currently implemented in the codebase) should not cause any problems.

Initially, I was assuming that we only need to change parsing/decoding to support it. But now that this would be something which we would have to do for Satisfy and all future things that we want to support makes me reconsider if this is really worth the effort.

I would also suggest updating the spec with a new rule that miniscripts with depth above X are not valid miniscripts. Realistically, I don't think we are losing anything and we also have similar (albeit consensus) rules for TAPROOT_MAX_TREE_DEPTH.

What do you think?

@sipa
Copy link
Owner

sipa commented Sep 14, 2021

I'm not opposed to adding a strict depth limit on the tree, but I'm not sure that's necessary.

Aren't pretty much all the recursive algorithms that operate on actual Miniscript effectively some computation that's done for every leaf, and then applying a combine function to aggregate child computation result into their parent computation result?

If so, that's perhaps even more elegantly written with an explicit stack?

@sanket1729
Copy link
Contributor

Aren't pretty much all the recursive algorithms that operate on actual Miniscript effectively some computation that's done for every leaf, and then applying a combine function to aggregate child computation result into their parent computation result?

Yes, this is correct. Maybe the code won't look that bad, it would be a loop and a bunch of push operations on the explicit stack. Even for the satisfy function where there is mutual recursion (calls to satisfy and dissatisfy), it does not seem that super unclean (as this PR). The only unclean part is decoding which is already implemented in this PR.

Maybe I just need to get over my affection for code elegancy :P.

@sipa
Copy link
Owner

sipa commented Sep 14, 2021

Well I think we also want a non-recursive parser for the textual descriptor form, but IIRC @meshcollider had that too?

@sanket1729
Copy link
Contributor

Yes, but the parsing code would be much more elegant than the decoding one.

bitcoin/script/miniscript.h Outdated Show resolved Hide resolved
bitcoin/script/miniscript.h Outdated Show resolved Hide resolved
bitcoin/script/miniscript.h Outdated Show resolved Hide resolved
bitcoin/script/miniscript.h Outdated Show resolved Hide resolved
@meshcollider
Copy link
Contributor Author

meshcollider commented Sep 16, 2021

@sipa Well I think we also want a non-recursive parser for the textual descriptor form, but IIRC @meshcollider had that too?

Yep! Draft is on this branch but I will wait until this PR is merged before opening a PR for that, so I can use the feedback and review comments from this PR on it.

Thanks for the feedback so far, I'll update the PR when you've finished going through it 😃

Copy link
Contributor

@sanket1729 sanket1729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second commit does indeed make the code more understandable. I added some comments that were helpful for me during review. I think adding some of those to the code might be helpful for future reviewers.

++in;
to_parse.emplace_back(DecodeContext::ZeroNotEqual, -1, -1);
to_parse.emplace_back(DecodeContext::Expression, -1, -1);
}
Copy link
Contributor

@sanket1729 sanket1729 Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth highlighting why sometimes we push MaybeAndV and sometimes we don't.

I think we should add a comment that for all fragments that start with a fragment code(like [X] CHECKSIG) and not a token opcode (OP_IF [X]) only have a push for DecodeContext::Expression whereas all fragments that are not at the start have both pushes for DecodeContext::MaybeAndV and DecodeContext::Expression.

I think this observation would aid faster review for future reviewers.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming suggestion:

Rename Expression to SingleBKVExpression

Introduce a BKVExpression, which just pushes MaybeAndV + SingleBKVExpression.

WExpression can stay.

to_parse.emplace_back(DecodeContext::OrB, -1, -1);
to_parse.emplace_back(DecodeContext::Expression, -1, -1);
to_parse.emplace_back(DecodeContext::MaybeSwap, -1, -1);
to_parse.emplace_back(DecodeContext::Expression, -1, -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the above observation, note that we don't do a push for DecodeContext::MaybeAndV for the first argument of and_b, or_b. But we do a DecodeContext::MaybeAndV for the second arg of and_b, or_b and fragments in OP_ENDIF (because they are guaranteed to not be the first argument).

return MakeNodeRef<Key>(NodeType::ANDOR, Vector(std::move(sub3), std::move(sub1), std::move(sub2)));
case DecodeContext::MaybeAndV: {
// If we reach a potential AND_V top-level, check if the next part of the script could be another AND_V child
if (in < last && in[0].first != OP_IF && in[0].first != OP_ELSE && in[0].first != OP_NOTIF && in[0].first != OP_TOALTSTACK && in[0].first != OP_SWAP) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above-listed opcodes cannot end any miniscript. Therefore they cannot be used in and_v because [X] in and_v(X, Y) is a well-formed miniscript.

bitcoin/script/miniscript.h Outdated Show resolved Hide resolved
@meshcollider meshcollider force-pushed the 202108_nonrecursive_decode_alone branch from 8b694cc to 5846360 Compare September 21, 2021 00:49
@sipa
Copy link
Owner

sipa commented Sep 21, 2021

Code review ACK 5846360. I've written a gramtropy grammar to generate random miniscripts, and made this code and the old code parse them and produce the corresponding scripts. Output is identical (on a large set of inputs, containing small ones exhaustively, and longer ones randomly).

Feel free to squash the fixups.

I had to make this reasoning to remind myself the validity of only looking for single BKV expressions in some places. The point is that whenever a BKV subexpression is parsed at the very beginning of a fragment's script code (e.g. and_b is [X] [Y] OP_BOOLAND, so the [X] is at the beginning), the and_v "commutes" with that fragment (e.g. and_b(and_v(A,B),C) and and_v(A,and_b(B,C)) correspond to the same script [A] [B] [C] OP_BOOLAND).

In some cases this commuting is meaningless. For example, or_b(and_v(A,B),C) and and_v(A,or_b(B,C)) clearly have distinct semantics, but still map to the same script. The explanation is that the first is not valid miniscript; or_b requires a first argument with the "d" property, and and_v never has the "d" property. All cases where the "d" is not required have this commuting property.

Perhaps it's useful to include this as rationale for the distinction somewhere?

@meshcollider meshcollider force-pushed the 202108_nonrecursive_decode_alone branch from 5846360 to 4085ffc Compare September 22, 2021 04:05
@meshcollider
Copy link
Contributor Author

Added some more comments and squashed! Thanks for reviewing :)

Copy link
Contributor

@sanket1729 sanket1729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
crACK 4085ffc

if (in == last || in[0].first != OP_SWAP) return {};
++in;
return MakeNodeRef<Key>(NodeType::WRAP_S, Vector(std::move(sub)));
if (constructed.size() != 1) return {};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change anything in the code here, but just a question: I think that constructed.size() != 1 at this point would indicate a bug in the code. DecodeContext::BKV_EXPR should translate (if successful) to exactly 1 stack element.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this can be replaced with assert.

@sipa
Copy link
Owner

sipa commented Sep 22, 2021

ACK 4085ffc

case DecodeContext::SWAP: {
if (in >= last || in[0].first != OP_SWAP) return {};
++in;
constructed.back() = MakeNodeRef<Key>(NodeType::WRAP_S, Vector(std::move(constructed.back())));
Copy link
Contributor

@darosior darosior Sep 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, i think you should check this is not nullptr? This is wrong, it can't be.

Comment on lines +1340 to +1346
case DecodeContext::THRESH_E: {
std::vector<NodeRef<Key>> subs;
for (int i = 0; i < n; ++i) {
NodeRef<Key> sub = std::move(constructed.back());
constructed.pop_back();
subs.push_back(std::move(sub));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You drop the check on 0 < k <= n so we'd crash asserting it later on

sipa added a commit that referenced this pull request Sep 27, 2021
c1917c3 miniscript: check thresh bounds when parsing a script (Antoine Poinsot)

Pull request description:

  We could eg parse a thresh with a `k` of `0` and later crash when asserting a sane `thresh` fragment. See #66 (comment)

ACKs for top commit:
  sipa:
    ACK c1917c3

Tree-SHA512: f8cbbd162dcdbc42ef6225224298737744d52b642f5434f0b9621d2648765547b9224031df0001aabfb5b12911300add198f66a827b610fa0fb1ca8503923e65
sipa added a commit that referenced this pull request Dec 6, 2021
60879af Use subspans rather than iterator when parsing (Samuel Dobson)
dcaf9ac Sanity checks on final miniscript produced by Parse (Samuel Dobson)
3fc78b9 Clean up Decode function (Samuel Dobson)
9f1d279 Make Parse function non-recursive (Samuel Dobson)

Pull request description:

  Follow-on from #66

  This makes Parse non-recursive too.

  disclaimer: I haven't thoroughly tested this since rebasing so using the grammar again to test this thoroughly would be great.

ACKs for top commit:
  sipa:
    utACK 60879af

Tree-SHA512: 83df7aae13498cbd49fa0c7f8613a84c9a9f373fc0dc7562a0fa9c8434ddc07f88324c3e25ead3c91ea14a8895544de1d063eab2ded7bda47d51ce56bd386e99
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants