-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make script decoding non-recursive #66
Make script decoding non-recursive #66
Conversation
2951ea9
to
98f7100
Compare
As a follow-up I have also made a similar change to Parse() here: https://github.com/meshcollider/miniscript/tree/202108_nonrecursive_parse But will wait for feedback on the approach with Decode before opening a PR for Parse too. |
Concept ACK. I'd want to see tests that verify this actually works correctly for ~all inputs. I may still have a gramtropy grammar for scripts lying around; with that it shouldn't be too hard to construct a test set and verify the old and new code parse everything identically. |
@sipa
I think these cases are fine because obviously there is inherent, unavoidable ambiguity in certain script decoding, but not sure how to deal with them in the random tests generation.
Obviously |
Second commit fixes the badly typed miniscript issues, so all tests in |
I have tested that ~18000 test vectors (originally from Dmitri)
Test branch for reference: https://github.com/sanket1729/bitcoin/tree/2021_09_decode_tests_rust I just did a code review by comparison with rust-miniscript (without looking if the code itself). Will do another review round tomorrow. |
Thanks Sanket! Appreciate your review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just started reviewing in detail. Feel free to address once I've gone through more.
@sipa @meshcollider, while implementing non-recursive parsing for rust-miniscript I realized that this would require changing all of the library functions to non-recursive ones. For c++ implementation, this means(right now) we only need to change the satisfy/ProduceInputHelper function to a non-recursive one. But eventually, if we want to support any sort of semantic analysis on miniscript, that would also need to be non-recursive. I think the effort for supporting these is not worth the time.
Initially, I was assuming that we only need to change parsing/decoding to support it. But now that this would be something which we would have to do for I would also suggest updating the spec with a new rule that miniscripts with depth above What do you think? |
I'm not opposed to adding a strict depth limit on the tree, but I'm not sure that's necessary. Aren't pretty much all the recursive algorithms that operate on actual Miniscript effectively some computation that's done for every leaf, and then applying a combine function to aggregate child computation result into their parent computation result? If so, that's perhaps even more elegantly written with an explicit stack? |
Yes, this is correct. Maybe the code won't look that bad, it would be a loop and a bunch of push operations on the explicit stack. Even for the satisfy function where there is mutual recursion (calls to satisfy and dissatisfy), it does not seem that super unclean (as this PR). The only unclean part is decoding which is already implemented in this PR. Maybe I just need to get over my affection for code elegancy :P. |
Well I think we also want a non-recursive parser for the textual descriptor form, but IIRC @meshcollider had that too? |
Yes, but the parsing code would be much more elegant than the decoding one. |
Yep! Draft is on this branch but I will wait until this PR is merged before opening a PR for that, so I can use the feedback and review comments from this PR on it. Thanks for the feedback so far, I'll update the PR when you've finished going through it 😃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second commit does indeed make the code more understandable. I added some comments that were helpful for me during review. I think adding some of those to the code might be helpful for future reviewers.
++in; | ||
to_parse.emplace_back(DecodeContext::ZeroNotEqual, -1, -1); | ||
to_parse.emplace_back(DecodeContext::Expression, -1, -1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth highlighting why sometimes we push MaybeAndV
and sometimes we don't.
I think we should add a comment that for all fragments that start with a fragment code(like [X] CHECKSIG
) and not a token opcode (OP_IF [X]
) only have a push for DecodeContext::Expression
whereas all fragments that are not at the start have both pushes for DecodeContext::MaybeAndV
and DecodeContext::Expression
.
I think this observation would aid faster review for future reviewers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming suggestion:
Rename Expression to SingleBKVExpression
Introduce a BKVExpression, which just pushes MaybeAndV + SingleBKVExpression.
WExpression can stay.
bitcoin/script/miniscript.h
Outdated
to_parse.emplace_back(DecodeContext::OrB, -1, -1); | ||
to_parse.emplace_back(DecodeContext::Expression, -1, -1); | ||
to_parse.emplace_back(DecodeContext::MaybeSwap, -1, -1); | ||
to_parse.emplace_back(DecodeContext::Expression, -1, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the above observation, note that we don't do a push for DecodeContext::MaybeAndV
for the first argument of and_b, or_b. But we do a DecodeContext::MaybeAndV
for the second arg of and_b, or_b and fragments in OP_ENDIF (because they are guaranteed to not be the first argument).
return MakeNodeRef<Key>(NodeType::ANDOR, Vector(std::move(sub3), std::move(sub1), std::move(sub2))); | ||
case DecodeContext::MaybeAndV: { | ||
// If we reach a potential AND_V top-level, check if the next part of the script could be another AND_V child | ||
if (in < last && in[0].first != OP_IF && in[0].first != OP_ELSE && in[0].first != OP_NOTIF && in[0].first != OP_TOALTSTACK && in[0].first != OP_SWAP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above-listed opcodes cannot end any miniscript. Therefore they cannot be used in and_v because [X] in and_v(X, Y) is a well-formed miniscript.
8b694cc
to
5846360
Compare
Code review ACK 5846360. I've written a gramtropy grammar to generate random miniscripts, and made this code and the old code parse them and produce the corresponding scripts. Output is identical (on a large set of inputs, containing small ones exhaustively, and longer ones randomly). Feel free to squash the fixups. I had to make this reasoning to remind myself the validity of only looking for single BKV expressions in some places. The point is that whenever a BKV subexpression is parsed at the very beginning of a fragment's script code (e.g. In some cases this commuting is meaningless. For example, Perhaps it's useful to include this as rationale for the distinction somewhere? |
5846360
to
4085ffc
Compare
Added some more comments and squashed! Thanks for reviewing :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
crACK 4085ffc
if (in == last || in[0].first != OP_SWAP) return {}; | ||
++in; | ||
return MakeNodeRef<Key>(NodeType::WRAP_S, Vector(std::move(sub))); | ||
if (constructed.size() != 1) return {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change anything in the code here, but just a question: I think that constructed.size() != 1
at this point would indicate a bug in the code. DecodeContext::BKV_EXPR
should translate (if successful) to exactly 1 stack element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this can be replaced with assert.
ACK 4085ffc |
case DecodeContext::SWAP: { | ||
if (in >= last || in[0].first != OP_SWAP) return {}; | ||
++in; | ||
constructed.back() = MakeNodeRef<Key>(NodeType::WRAP_S, Vector(std::move(constructed.back()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below, i think you should check this is not This is wrong, it can't be.nullptr
?
case DecodeContext::THRESH_E: { | ||
std::vector<NodeRef<Key>> subs; | ||
for (int i = 0; i < n; ++i) { | ||
NodeRef<Key> sub = std::move(constructed.back()); | ||
constructed.pop_back(); | ||
subs.push_back(std::move(sub)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You drop the check on 0 < k <= n
so we'd crash asserting it later on
c1917c3 miniscript: check thresh bounds when parsing a script (Antoine Poinsot) Pull request description: We could eg parse a thresh with a `k` of `0` and later crash when asserting a sane `thresh` fragment. See #66 (comment) ACKs for top commit: sipa: ACK c1917c3 Tree-SHA512: f8cbbd162dcdbc42ef6225224298737744d52b642f5434f0b9621d2648765547b9224031df0001aabfb5b12911300add198f66a827b610fa0fb1ca8503923e65
60879af Use subspans rather than iterator when parsing (Samuel Dobson) dcaf9ac Sanity checks on final miniscript produced by Parse (Samuel Dobson) 3fc78b9 Clean up Decode function (Samuel Dobson) 9f1d279 Make Parse function non-recursive (Samuel Dobson) Pull request description: Follow-on from #66 This makes Parse non-recursive too. disclaimer: I haven't thoroughly tested this since rebasing so using the grammar again to test this thoroughly would be great. ACKs for top commit: sipa: utACK 60879af Tree-SHA512: 83df7aae13498cbd49fa0c7f8613a84c9a9f373fc0dc7562a0fa9c8434ddc07f88324c3e25ead3c91ea14a8895544de1d063eab2ded7bda47d51ce56bd386e99
Implements a non-recursive Decode algorithm.
Obviously this is not sipa's ideal generic solution but I just thought I'd open this up as a start (and potentially for comparison).
Similar to the rust implementation here:
https://github.com/rust-bitcoin/rust-miniscript/blob/master/src/miniscript/decode.rs