Make script decoding non-recursive #66

meshcollider · 2021-08-26T12:57:14Z

Implements a non-recursive Decode algorithm.

Obviously this is not sipa's ideal generic solution but I just thought I'd open this up as a start (and potentially for comparison).

Similar to the rust implementation here:
https://github.com/rust-bitcoin/rust-miniscript/blob/master/src/miniscript/decode.rs

meshcollider · 2021-08-29T14:17:37Z

As a follow-up I have also made a similar change to Parse() here: https://github.com/meshcollider/miniscript/tree/202108_nonrecursive_parse

But will wait for feedback on the approach with Decode before opening a PR for Parse too.

sipa · 2021-08-30T18:41:54Z

Concept ACK.

I'd want to see tests that verify this actually works correctly for ~all inputs.

I may still have a gramtropy grammar for scripts lying around; with that it shouldn't be too hard to construct a test set and verify the old and new code parse everything identically.

meshcollider · 2021-08-31T06:19:01Z

@sipa
I have been doing some testing with the random miniscript generator and there are a few edge-cases where the decoding is not exactly identical to the input.
For example:

A number of wrappers will move inside an and_v expression. For example, c:and_v(X, Y) becomes [X] [Y] CHECKSIG which is then decoded as and_v(X, c:Y) (turning pk_k into c:pk). The same happens with the n: wrapper.
vtv:pk(key) becomes KEY CHECKSIG VERIFY 1 VERIFY which is then decoded as and_v(v:pk(key), v:1) (essentially a special case of the above due to the moving v: wrapper)
There is inherent ambiguity in [X] [Y] [Z] of course, whether it is and_v(X, and_v(Y, Z)) or and_v(and_v(X, Y), Z)

I think these cases are fine because obviously there is inherent, unavoidable ambiguity in certain script decoding, but not sure how to deal with them in the random tests generation.
However, some cases that don't work arise because of invalid types. For example:

and_v(X, and_b(Y, st:Z)) gives [X] [Y] SWAP [Z] 1 BOOLAND which decodes as and_v(and_v(X, Y), s:and_b(Z, 1))

Obviously 1 is not W-type so this is not a valid thing to do. I haven't tested but the rust implementation looks like it will have the same issue (or simply error).
I'll work on a fix for this.

meshcollider · 2021-08-31T08:06:23Z

Second commit fixes the badly typed miniscript issues, so all tests in miniscript_tests.cpp pass

practicalswift · 2021-09-04T18:40:47Z

Concept ACK

Policies with too high nesting depth are not rejected: It is possible to create small inputs that cause extreme memory usage (in practice: OOM kill or std::bad_alloc) #7
Calling FromScript with a script with 2000 consecutive OP_CHECKSIGVERIFY opcodes causes execution to halt due to running out of stack space #14
Calling node->ToString on a node constructed from 2000 consecutive OP_FALSE opcodes causes execution to halt due to running out of stack space #15

sanket1729 · 2021-09-05T01:50:30Z

I have tested that ~18000 test vectors (originally from Dmitri) rust-miniscript give the same output in all three cases.

The recursive c++ implementation in master
The non-recursive one in this PR
rust-miniscript implementation

Test branch for reference: https://github.com/sanket1729/bitcoin/tree/2021_09_decode_tests_rust
This takes time to compile because I directly pasted the tests into a file instead of reading from a file. I did not bother fixing it because I only wanted to check it as a PoC.

I just did a code review by comparison with rust-miniscript (without looking if the code itself). Will do another review round tomorrow.

meshcollider · 2021-09-05T05:33:13Z

Thanks Sanket! Appreciate your review

sipa

Just started reviewing in detail. Feel free to address once I've gone through more.

bitcoin/script/miniscript.h

sanket1729 · 2021-09-14T09:33:42Z

@sipa @meshcollider, while implementing non-recursive parsing for rust-miniscript I realized that this would require changing all of the library functions to non-recursive ones.

For c++ implementation, this means(right now) we only need to change the satisfy/ProduceInputHelper function to a non-recursive one. But eventually, if we want to support any sort of semantic analysis on miniscript, that would also need to be non-recursive.

I think the effort for supporting these is not worth the time.

The earlier recursive code is much cleaner to review, easy to maintain and build upon.
It's unclear if we are ruling any potential use-cases. Ruling out miniscripts over depth X(or 402 as it is currently implemented in the codebase) should not cause any problems.

Initially, I was assuming that we only need to change parsing/decoding to support it. But now that this would be something which we would have to do for Satisfy and all future things that we want to support makes me reconsider if this is really worth the effort.

I would also suggest updating the spec with a new rule that miniscripts with depth above X are not valid miniscripts. Realistically, I don't think we are losing anything and we also have similar (albeit consensus) rules for TAPROOT_MAX_TREE_DEPTH.

What do you think?

sipa · 2021-09-14T12:11:52Z

I'm not opposed to adding a strict depth limit on the tree, but I'm not sure that's necessary.

Aren't pretty much all the recursive algorithms that operate on actual Miniscript effectively some computation that's done for every leaf, and then applying a combine function to aggregate child computation result into their parent computation result?

If so, that's perhaps even more elegantly written with an explicit stack?

sanket1729 · 2021-09-14T12:50:33Z

Aren't pretty much all the recursive algorithms that operate on actual Miniscript effectively some computation that's done for every leaf, and then applying a combine function to aggregate child computation result into their parent computation result?

Yes, this is correct. Maybe the code won't look that bad, it would be a loop and a bunch of push operations on the explicit stack. Even for the satisfy function where there is mutual recursion (calls to satisfy and dissatisfy), it does not seem that super unclean (as this PR). The only unclean part is decoding which is already implemented in this PR.

Maybe I just need to get over my affection for code elegancy :P.

sipa · 2021-09-14T13:17:21Z

Well I think we also want a non-recursive parser for the textual descriptor form, but IIRC @meshcollider had that too?

sanket1729 · 2021-09-14T13:29:09Z

Yes, but the parsing code would be much more elegant than the decoding one.

bitcoin/script/miniscript.h

meshcollider · 2021-09-16T00:08:03Z

@sipa Well I think we also want a non-recursive parser for the textual descriptor form, but IIRC @meshcollider had that too?

Yep! Draft is on this branch but I will wait until this PR is merged before opening a PR for that, so I can use the feedback and review comments from this PR on it.

Thanks for the feedback so far, I'll update the PR when you've finished going through it 😃

sanket1729

The second commit does indeed make the code more understandable. I added some comments that were helpful for me during review. I think adding some of those to the code might be helpful for future reviewers.

sanket1729 · 2021-09-16T19:59:03Z

bitcoin/script/miniscript.h

+                ++in;
+                to_parse.emplace_back(DecodeContext::ZeroNotEqual, -1, -1);
+                to_parse.emplace_back(DecodeContext::Expression, -1, -1);
+            }


It might be worth highlighting why sometimes we push MaybeAndV and sometimes we don't.

I think we should add a comment that for all fragments that start with a fragment code(like [X] CHECKSIG) and not a token opcode (OP_IF [X]) only have a push for DecodeContext::Expression whereas all fragments that are not at the start have both pushes for DecodeContext::MaybeAndV and DecodeContext::Expression.

I think this observation would aid faster review for future reviewers.

Naming suggestion:

Rename Expression to SingleBKVExpression

Introduce a BKVExpression, which just pushes MaybeAndV + SingleBKVExpression.

WExpression can stay.

sanket1729 · 2021-09-16T20:02:27Z

bitcoin/script/miniscript.h

+                to_parse.emplace_back(DecodeContext::OrB, -1, -1);
+                to_parse.emplace_back(DecodeContext::Expression, -1, -1);
+                to_parse.emplace_back(DecodeContext::MaybeSwap, -1, -1);
+                to_parse.emplace_back(DecodeContext::Expression, -1, -1);


Because of the above observation, note that we don't do a push for DecodeContext::MaybeAndV for the first argument of and_b, or_b. But we do a DecodeContext::MaybeAndV for the second arg of and_b, or_b and fragments in OP_ENDIF (because they are guaranteed to not be the first argument).

sanket1729 · 2021-09-16T20:10:14Z

bitcoin/script/miniscript.h

-            return MakeNodeRef<Key>(NodeType::ANDOR, Vector(std::move(sub3), std::move(sub1), std::move(sub2)));
+        case DecodeContext::MaybeAndV: {
+            // If we reach a potential AND_V top-level, check if the next part of the script could be another AND_V child
+            if (in < last && in[0].first != OP_IF && in[0].first != OP_ELSE && in[0].first != OP_NOTIF && in[0].first != OP_TOALTSTACK && in[0].first != OP_SWAP) {


The above-listed opcodes cannot end any miniscript. Therefore they cannot be used in and_v because [X] in and_v(X, Y) is a well-formed miniscript.

bitcoin/script/miniscript.h

sipa · 2021-09-21T23:12:39Z

Code review ACK 5846360. I've written a gramtropy grammar to generate random miniscripts, and made this code and the old code parse them and produce the corresponding scripts. Output is identical (on a large set of inputs, containing small ones exhaustively, and longer ones randomly).

Feel free to squash the fixups.

I had to make this reasoning to remind myself the validity of only looking for single BKV expressions in some places. The point is that whenever a BKV subexpression is parsed at the very beginning of a fragment's script code (e.g. and_b is [X] [Y] OP_BOOLAND, so the [X] is at the beginning), the and_v "commutes" with that fragment (e.g. and_b(and_v(A,B),C) and and_v(A,and_b(B,C)) correspond to the same script [A] [B] [C] OP_BOOLAND).

In some cases this commuting is meaningless. For example, or_b(and_v(A,B),C) and and_v(A,or_b(B,C)) clearly have distinct semantics, but still map to the same script. The explanation is that the first is not valid miniscript; or_b requires a first argument with the "d" property, and and_v never has the "d" property. All cases where the "d" is not required have this commuting property.

Perhaps it's useful to include this as rationale for the distinction somewhere?

meshcollider · 2021-09-22T04:05:53Z

Added some more comments and squashed! Thanks for reviewing :)

sanket1729

Looks good to me.
crACK 4085ffc

sipa · 2021-09-22T16:41:36Z

bitcoin/script/miniscript.h

-    if (in == last || in[0].first != OP_SWAP) return {};
-    ++in;
-    return MakeNodeRef<Key>(NodeType::WRAP_S, Vector(std::move(sub)));
+    if (constructed.size() != 1) return {};


No need to change anything in the code here, but just a question: I think that constructed.size() != 1 at this point would indicate a bug in the code. DecodeContext::BKV_EXPR should translate (if successful) to exactly 1 stack element.

Yes, I think this can be replaced with assert.

sipa · 2021-09-22T17:03:38Z

ACK 4085ffc

darosior · 2021-09-26T10:15:42Z

bitcoin/script/miniscript.h

+        case DecodeContext::SWAP: {
+            if (in >= last || in[0].first != OP_SWAP) return {};
+            ++in;
+            constructed.back() = MakeNodeRef<Key>(NodeType::WRAP_S, Vector(std::move(constructed.back())));


~~Here and below, i think you should check this is not nullptr?~~ This is wrong, it can't be.

darosior · 2021-09-26T10:17:07Z

bitcoin/script/miniscript.h

+        case DecodeContext::THRESH_E: {
+            std::vector<NodeRef<Key>> subs;
+            for (int i = 0; i < n; ++i) {
+                NodeRef<Key> sub = std::move(constructed.back());
+                constructed.pop_back();
+                subs.push_back(std::move(sub));
+            }


You drop the check on 0 < k <= n so we'd crash asserting it later on

c1917c3 miniscript: check thresh bounds when parsing a script (Antoine Poinsot) Pull request description: We could eg parse a thresh with a `k` of `0` and later crash when asserting a sane `thresh` fragment. See #66 (comment) ACKs for top commit: sipa: ACK c1917c3 Tree-SHA512: f8cbbd162dcdbc42ef6225224298737744d52b642f5434f0b9621d2648765547b9224031df0001aabfb5b12911300add198f66a827b610fa0fb1ca8503923e65

60879af Use subspans rather than iterator when parsing (Samuel Dobson) dcaf9ac Sanity checks on final miniscript produced by Parse (Samuel Dobson) 3fc78b9 Clean up Decode function (Samuel Dobson) 9f1d279 Make Parse function non-recursive (Samuel Dobson) Pull request description: Follow-on from #66 This makes Parse non-recursive too. disclaimer: I haven't thoroughly tested this since rebasing so using the grammar again to test this thoroughly would be great. ACKs for top commit: sipa: utACK 60879af Tree-SHA512: 83df7aae13498cbd49fa0c7f8613a84c9a9f373fc0dc7562a0fa9c8434ddc07f88324c3e25ead3c91ea14a8895544de1d063eab2ded7bda47d51ce56bd386e99

meshcollider force-pushed the 202108_nonrecursive_decode_alone branch from 2951ea9 to 98f7100 Compare August 26, 2021 12:59

sipa reviewed Sep 13, 2021

View reviewed changes

bitcoin/script/miniscript.h Show resolved Hide resolved

bitcoin/script/miniscript.h Outdated Show resolved Hide resolved

bitcoin/script/miniscript.h Outdated Show resolved Hide resolved

sipa reviewed Sep 14, 2021

View reviewed changes

bitcoin/script/miniscript.h Outdated Show resolved Hide resolved

bitcoin/script/miniscript.h Outdated Show resolved Hide resolved

bitcoin/script/miniscript.h Outdated Show resolved Hide resolved

bitcoin/script/miniscript.h Outdated Show resolved Hide resolved

sanket1729 reviewed Sep 16, 2021

View reviewed changes

meshcollider force-pushed the 202108_nonrecursive_decode_alone branch from 8b694cc to 5846360 Compare September 21, 2021 00:49

Make script decoding non-recursive

4085ffc

meshcollider force-pushed the 202108_nonrecursive_decode_alone branch from 5846360 to 4085ffc Compare September 22, 2021 04:05

sanket1729 approved these changes Sep 22, 2021

View reviewed changes

sipa reviewed Sep 22, 2021

View reviewed changes

sipa merged commit 200f5ef into sipa:master Sep 22, 2021

sipa mentioned this pull request Sep 22, 2021

Limit the number of opcodes in FromScript #62

Closed

meshcollider mentioned this pull request Sep 23, 2021

Make Parse function non-recursive #68

Merged

sipa mentioned this pull request Sep 23, 2021

Calling FromScript with a script with 2000 consecutive OP_CHECKSIGVERIFY opcodes causes execution to halt due to running out of stack space #14

Closed

darosior reviewed Sep 26, 2021

View reviewed changes

darosior mentioned this pull request Sep 26, 2021

miniscript: check thresh bounds when parsing a script #71

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make script decoding non-recursive #66

Make script decoding non-recursive #66

meshcollider commented Aug 26, 2021

meshcollider commented Aug 29, 2021

sipa commented Aug 30, 2021

meshcollider commented Aug 31, 2021 •

edited

Loading

meshcollider commented Aug 31, 2021

practicalswift commented Sep 4, 2021

sanket1729 commented Sep 5, 2021 •

edited

Loading

meshcollider commented Sep 5, 2021

sipa left a comment

sanket1729 commented Sep 14, 2021

sipa commented Sep 14, 2021

sanket1729 commented Sep 14, 2021

sipa commented Sep 14, 2021

sanket1729 commented Sep 14, 2021

meshcollider commented Sep 16, 2021 •

edited

Loading

sanket1729 left a comment

sanket1729 Sep 16, 2021 •

edited

Loading

sipa Sep 16, 2021

sanket1729 Sep 16, 2021

sanket1729 Sep 16, 2021

sipa commented Sep 21, 2021 •

edited

Loading

meshcollider commented Sep 22, 2021

sanket1729 left a comment

sipa Sep 22, 2021

sanket1729 Sep 22, 2021

sipa commented Sep 22, 2021

darosior Sep 26, 2021 •

edited

Loading

darosior Sep 26, 2021

Make script decoding non-recursive #66

Make script decoding non-recursive #66

Conversation

meshcollider commented Aug 26, 2021

meshcollider commented Aug 29, 2021

sipa commented Aug 30, 2021

meshcollider commented Aug 31, 2021 • edited Loading

meshcollider commented Aug 31, 2021

practicalswift commented Sep 4, 2021

sanket1729 commented Sep 5, 2021 • edited Loading

meshcollider commented Sep 5, 2021

sipa left a comment

Choose a reason for hiding this comment

sanket1729 commented Sep 14, 2021

sipa commented Sep 14, 2021

sanket1729 commented Sep 14, 2021

sipa commented Sep 14, 2021

sanket1729 commented Sep 14, 2021

meshcollider commented Sep 16, 2021 • edited Loading

sanket1729 left a comment

Choose a reason for hiding this comment

sanket1729 Sep 16, 2021 • edited Loading

Choose a reason for hiding this comment

sipa Sep 16, 2021

Choose a reason for hiding this comment

sanket1729 Sep 16, 2021

Choose a reason for hiding this comment

sanket1729 Sep 16, 2021

Choose a reason for hiding this comment

sipa commented Sep 21, 2021 • edited Loading

meshcollider commented Sep 22, 2021

sanket1729 left a comment

Choose a reason for hiding this comment

sipa Sep 22, 2021

Choose a reason for hiding this comment

sanket1729 Sep 22, 2021

Choose a reason for hiding this comment

sipa commented Sep 22, 2021

darosior Sep 26, 2021 • edited Loading

Choose a reason for hiding this comment

darosior Sep 26, 2021

Choose a reason for hiding this comment

meshcollider commented Aug 31, 2021 •

edited

Loading

sanket1729 commented Sep 5, 2021 •

edited

Loading

meshcollider commented Sep 16, 2021 •

edited

Loading

sanket1729 Sep 16, 2021 •

edited

Loading

sipa commented Sep 21, 2021 •

edited

Loading

darosior Sep 26, 2021 •

edited

Loading