Skip to content

Files

Latest commit

 

History

History
423 lines (303 loc) · 20.2 KB

ch08.asciidoc

File metadata and controls

423 lines (303 loc) · 20.2 KB

Pay To Script Hash

Up to this point in the book, we’ve been doing single-key transactions, or transactions where only a single private key per input. What if we wanted something a little more complicated? A company that has $100 million in Bitcoin might not want funds locked to a single private key as that key can be stolen. Loss of a single key would mean all funds would then be lost. What can we do to reduce the single point of failure?

The solution is multisig, or multiple signatures. This was built into Bitcoin from the beginning, but was clunky at first and so wasn’t used. As we’ll discover later in this chapter, Satoshi probably didn’t test multisig as it has an off-by-one error (see `OP_CHECKMULTISIG` Off-by-one Bug later in the chapter). The bug has had to stay in the protocol as fixing it would require a hard fork.

Note
Multiple Private Keys to a Single Aggregated Public Key

It is possible to "split" a single private key into multiple private keys and use an interactive method to aggregate signatures without ever reconstructing the private key, but this is not a common practice. Schnorr signatures will make aggregating signatures easier and perhaps more common in the future.

Bare Multisig

Bare Multisig was the first attempt at creating transaction outputs that require signatures from multiple parties. The idea is to change from a single point of failure to something a little more resilient to hacks. To understand Bare Multisig, one must first understand the OP_CHECKMULTISIG opcode. As discussed in [chapter_script], Script has a lot of different opcodes. OP_CHECKMULTISIG is one of them at 0xae. The opcode consumes a lot of elements from the stack and returns whether or not the required number of signatures are valid for a transaction input.

The transaction output is called "bare" multisig because it’s a long ScriptPubKey. Bare Multisig ScriptPubKey shows what a ScriptPubKey for a 1-of-2 multisig looks like.

Bare Multisig ScriptPubKey
Figure 1. Bare Multisig ScriptPubKey

Among Bare Multisig ScriptPubKeys, this one is on the small end, and we can already see that it’s long. The ScriptPubKey for p2pkh is only 25 bytes whereas this Bare Multisig is 101 bytes (though obviously, compressed SEC format would reduce it some), and this is a 1-of-2! Bare Multisig ScriptSig shows what the ScriptSig looks like:

Bare Multisig ScriptSig
Figure 2. Bare Multisig ScriptSig

We only need 1 signature for this 1-of-2 multisig, so this is relatively short, though something like a 5-of-7 would require 5 DER signatures and would be a lot longer (360 bytes or so). Bare Multisig Combined Script. shows how the two combine.

Bare Multisig Combined Script
Figure 3. Bare Multisig Combined Script.

Note m and n are something between 1 and 16 inclusive. We’ve generalized here so we can see what a Bare m-of-n Multisig would look like (m and n can be anything from 1 to 20, though there’s numerical opcodes only go up to OP_16, 17 to 20 would require 0112 to push a number like 18 to the stack). The starting state looks like Bare Multisig Start:

Bare Multisig Start
Figure 4. Bare Multisig Start

OP_0 will push the number 0 to the stack (Bare Multisig Step 1).

Bare Multisig Step 1
Figure 5. Bare Multisig Step 1

The signatures are elements so they’ll be pushed directly to the stack (Bare Multisig Step 2).

Bare Multisig Step 2
Figure 6. Bare Multisig Step 2

OP_m will push the number m on the stack, the public keys will be pushed to the stack and OP_n will push the number n to the stack (Bare Multisig Step 3).

Bare Multisig Step 3
Figure 7. Bare Multisig Step 3

At this point, OP_CHECKMULTISIG will consume m+n+3 elements (see `OP_CHECKMULTISIG` Off-by-one Bug) and push a 1 to the stack if m of the signatures are valid for m distinct public keys from the list of n public keys, a 0 otherwise. Assuming that the signatures are valid, the stack has a single 1 which validates the combined Script (Bare Multisig End):

Bare Multisig End
Figure 8. Bare Multisig End
Note
OP_CHECKMULTISIG Off-by-one Bug

The stack elements consumed by OP_CHECKMULTISIG are supposed to be:

m, m different signatures, n, n different pubkeys.

The number of elements consumed should be 2 (m and n themselves) + m (signatures) + n (pubkeys). Unfortunately, the opcode consumes 1 more element than the m+n+2 elements that it’s supposed to. OP_CHECKMULTISIG consumes m+n+3, so an extra element is added (OP_0 in our example) in order to not cause a failure. The opcode does nothing with that extra element and that extra element can be anything.

As a way to combat malleability, however, most nodes on the Bitcoin network will not relay the transaction unless that extra element is OP_0. Note that if we had m+n+2 elements, that OP_CHECKMULTISIG will just fail as there are not enough elements to be consumed and the combined Script will fail causing the transaction to be invalid.

Coding OP_CHECKMULTISIG

In an m-of-n Bare Multisig, the stack contains n as the top element, then n pubkeys, then m, then m signatures and finally, a filler item due to the off-by-one bug. The code for OP_CHECKMULTISIG in op.py is mostly written here:

link:code-ch08/op.py[role=include]
  1. Each DER signature is assumed to be signed with SIGHASH_ALL.

  2. We take care of the off-by-one error by consuming the top element of the stack and not doing anything with the element.

  3. This is the part that you will need to code for the next exercise.

Problems with Bare Multisig

Bare multisig is a bit ugly, but it is functional. This reduces the single point of failure by requiring m of n signatures to unlock a UTXO. There is plenty of utility in making outputs multisig, especially if you’re a business. However, Bare Multisig suffers from a few problems:

  1. First problem: the length of the ScriptPubKey. A Bare Multisig ScriptPubKey has many different public keys and that makes the ScriptPubKey long. Unlike p2pkh or even p2pk, these are not easily communicated using voice or even text message.

  2. Second problem: because the output is so long, it requires more resources for node software. Nodes keep track of the UTXO set, so a big ScriptPubKey is more expensive to keep track of. A large output is more expensive to keep in fast-access storage (like RAM), being 5-20x larger than a normal p2pkh output.

  3. Third problem: because the ScriptPubKey can be so much bigger, bare multisig can and has been abused. The entire PDF of the Satoshi’s original whitepaper is encoded in this transaction in block 230009:

54e48e5f5c656b26c3bca14a8c95aa583d07ebe84dde3b7dd4a78f4e4186e713

The creator of this transaction split up the whitepaper PDF into 64 byte chunks which were then made into invalid uncompressed public keys. The whitepaper was encoded into 947 outputs as 1-of-3 Bare Multisig outputs. These outputs are not spendable but have to be indexed in the UTXO sets of full nodes. This is a tax every full node has to pay and is in that sense abusive.

In order to mitigate these problems, pay-to-script-hash (p2sh) was born.

Pay-to-Script-Hash (p2sh)

Pay-to-script-hash (p2sh) is a general solution to the long address/ScriptPubKey problem. More complicated ScriptPubKeys than Bare Multisig can easily be made and they have the same problems as Bare Multisig.

The solution that p2sh implements is to take the hash of some Script instructions and then reveal the pre-image Script instructions later. Pay-to-script-hash was introduced in 2011 to a lot of controversy. There were multiple proposals, but as we’ll see, p2sh is kludgy, but works.

In p2sh, a special rule gets executed only when the pattern shown in Pay-to-script-hash Pattern that executes the special rule is encountered:

p2sh Pattern
Figure 9. Pay-to-script-hash Pattern that executes the special rule

If this exact instruction set ends with a 1 on the stack, then the RedeemScript (top item in Pay-to-script-hash Pattern that executes the special rule) is parsed and then added to the Script instruction set. This special pattern was introduced in BIP0016 and Bitcoin software that implements BIP0016 (anything post 2011) checks for the pattern. The RedeemScript does not add new Script instructions for processing unless this exact sequence is encountered and ends with a 1.

If this sounds hacky, it is. But before we get to that, let’s look a little closer at exactly how this plays out.

Let’s say we have a 2-of-2 multisig ScriptPubKey (Pay-to-script-hash (p2sh) RedeemScript):

p2sh RedeemScript
Figure 10. Pay-to-script-hash (p2sh) RedeemScript

This is a ScriptPubKey for a Bare Multisig. What we need to do to convert this to p2sh is to take a hash of this Script and keep this Script handy for when we want to redeem it. We call this the RedeemScript, because the Script is only revealed during redemption. We put the hash of the RedeemScript as the ScriptPubKey like Pay-to-script-hash (p2sh) ScriptPubKey:

p2sh ScriptPubKey
Figure 11. Pay-to-script-hash (p2sh) ScriptPubKey

The hash digest here is the hash160 of the RedeemScript, or what was previously the ScriptPubKey. We’re locked the funds to the hash of the RedeemScript which needs to be revealed at unlock time.

Creating the ScriptSig for a p2sh script involves not only revealing the RedeemScript, but also unlocking the RedeemScript. At this point, you might wonder, where is the RedeemScript stored? The RedeemScript is not on the blockchain until actual redemption, so it must be stored by the creator of the p2sh address. If the RedeemScript is lost and cannot be reconstructed, the funds are lost, so it’s very important to keep track of it!

Warning
Importance of keeping the RedeemScript

If you are receiving to a p2sh address, be sure to store and backup the RedeemScript! Better yet, make it easy to reconstruct!

The ScriptSig for the 2-of-2 multisig looks like Pay-to-script-hash (p2sh) ScriptSig:

p2sh ScriptSig
Figure 12. Pay-to-script-hash (p2sh) ScriptSig

This produces the combined Script (p2sh Combined):

p2sh Combined Script
Figure 13. p2sh Combined

As before, OP_0 is there because of the OP_CHECKMULTISIG bug. The key to understanding p2sh is the execution of the exact sequence shown in p2sh pattern that executes the special rule:

p2sh Pattern
Figure 14. p2sh pattern that executes the special rule

Upon execution of this sequence, if the stack is left with a 1, the RedeemScript is inserted into the Script instruction set. In other words, if we reveal a RedeemScript whose hash160 is the same hash160 in the ScriptPubKey, that RedeemScript acts like the ScriptPubKey instead. We hash the Script that locks the funds and put that into the blockchain instead of the Script itself. This is why we call this ScriptPubKey pay-to-script-hash.

Let’s go through exactly how this works. We start with the Script instructions (p2sh Start):

p2sh Start
Figure 15. p2sh Start

OP_0 will push a 0 to the stack, the two signatures and the RedeemScript will be pushed to the stack directly, leading to p2sh Step 1:

p2sh Step 1
Figure 16. p2sh Step 1

OP_HASH160 will hash the RedeemScript, which will make the stack look like p2sh Step 2:

p2sh Step 2
Figure 17. p2sh Step 2

The 20-byte hash will be pushed on the stack (p2sh Step 3):

p2sh Step 3
Figure 18. p2sh Step 3

And finally, OP_EQUAL will compare the top two elements. If the software checking this transaction is pre-BIP0016, we would end up with p2sh End if evaluating with pre-BIP0016 software:

p2sh pre-BIP0016 End
Figure 19. p2sh End if evaluating with pre-BIP0016 software

This would end evaluation for pre-BIP0016 nodes and the result would be valid, assuming the hashes are equal.

On the other hand, BIP0016 nodes, which as of this writing are the vast majority, will parse the RedeemScript as Script instructions (p2sh RedeemScript):

p2sh RedeemScript
Figure 20. p2sh RedeemScript

These go into the Script column as instructions (p2sh Step 4):

p2sh Step 4
Figure 21. p2sh Step 4

OP_2 pushes a 2 to the stack, the pubkeys are also pushed and a final OP_2 pushes another 2 to the stack (p2sh Step 5):

p2sh Step 5
Figure 22. p2sh Step 5

OP_CHECKMULTISIG consumes m+n+3 elements, which is the entire stack, and we end the same way we did Bare Multisig (p2sh End for post-BIP0016 software).

p2sh End
Figure 23. p2sh End for post-BIP0016 software

The RedeemScript substitution is a bit hacky and there’s special-cased code in Bitcoin software to handle this. Why wasn’t something a lot less hacky and more intuitive chosen? BIP0012 was a competing proposal at the time which used OP_EVAL and was considered more elegant. A ScriptPubKey like OP_EVAL would have been an instruction which adds additional instructions based on the top element. would have worked with BIP0012:

`OP_EVAL`
Figure 24. OP_EVAL would have been an instruction which adds additional instructions based on the top element.

OP_EVAL would have consumed the top element of the stack and interpreted that as Script instructions to be put into the Script column.

Unfortunately, this more elegant solution comes with an unwanted side-effect, namely Turing-Completeness. Turing-Completeness is undesirable as it makes the security of a smart contract much harder to guarantee (see [chapter_script]). Thus, the more hacky, but more secure option of special-casing was chosen in BIP0016. BIP0016 or p2sh was implemented in 2011 and continues to be a part of the network today.

Coding p2sh

The special pattern of RedeemScript, OP_HASH160, hash160 and OP_EQUAL needs handling. The evaluate method in script.py is where we handle the special case:

class Script:
...
    def evaluate(self, z):
...
        while len(instructions) > 0:
            instruction = instructions.pop(0)
            if type(instruction) == int:
...
link:code-ch08/script.py[role=include]
  1. 0xa9 is OP_HASH160, 0x87 is OP_EQUAL. We’re checking that the next 3 instructions conform to the BIP0016 special pattern.

  2. We know that this is OP_HASH160, so we just pop it off. Similarly, we know the next instruction is the 20-byte hash value and the third instruction is OP_EQUAL, which is what we tested for in the if statement above it.

  3. We run the OP_HASH160, 20-byte hash push on the stack and OP_EQUAL as normal.

  4. There should be a 1 remaining, which is what op_verify checks for (OP_VERIFY consumes 1 element and does not put anything back).

  5. Because we want to parse the RedeemScript, we need to prepend the length.

  6. We extend the instruction set with the parsed instructions from the RedeemScript.

More complicated scripts

The nice thing about p2sh is that the RedeemScript can be as long as the largest single element from OP_PUSHDATA2, which is 520 bytes. Multisig is just one possibility. You can have Scripts that define more complicated logic like "2 of 3 of these keys or 5 of 7 of these other keys". The main feature of p2sh is that it’s flexible and at the same time reduces the UTXO set size by pushing the burden of storing part of the Script back to the user.

In [chapter_segwit], p2sh is also used to make Segwit backwards compatible.

Addresses

To compute p2sh addresses, we use a process similar to how we compute p2pkh addresses. The hash160 is prepended with a prefix byte and appended with a checksum.

Mainnet p2sh uses the 0x05 byte which causes addresses to start with a 3 in Base58 while testnet p2sh uses the 0xc4 byte to cause addresses to start with a 2. We can calculate the address using the encode_base58_checksum function from helper.py.

link:code-ch08/examples.py[role=include]

p2sh Signature Verification

As with p2pkh, one of the tricky aspects of p2sh is verifying the signatures. The p2sh signature verification is different than the p2pkh process covered in [chapter_tx].

Unlike p2pkh where there’s only 1 signature and 1 public key, we have some number of pubkeys (in SEC format in the RedeemScript) and some equal or smaller number of signatures (in DER format in the ScriptSig). Thankfully, signatures have to be in the same order as the pubkeys or the signatures are not considered valid.

Once we have a particular signature and public key, we only need the signature hash, or z to figure out whether the signature is valid (Validation of p2sh Inputs).

Validation Start
Figure 25. Validation of p2sh Inputs

As with p2pkh, finding the signature hash is the most difficult part of the p2sh signature validation process and we’ll now proceed to cover this in detail.

Step 1: Empty all the ScriptSigs

The first step is to empty all the ScriptSigs when checking the signature (Empty each input’s ScriptSig). The same procedure is used for creating the signature.

Validation Step 1
Figure 26. Empty each input’s ScriptSig

Step 2: Replace the ScriptSig of the p2sh input being signed with the RedeemScript

Each p2sh input has a RedeemScript. We take the RedeemScript and put that in place of the empty ScriptSig (Replace the ScriptSig of the input we’re checking with the RedeemScript). This is different from p2pkh in that it’s not the ScriptPubKey.

Validation Step 2
Figure 27. Replace the ScriptSig of the input we’re checking with the RedeemScript

Step 3: Append the hash type

Lastly, we add a 4-byte hash type to the end (Append the hash type SIGHASH_ALL, or the blue part at the end.). This is the same as in p2pkh.

The integer corresponding to SIGHASH_ALL is 1 and this has to be encoded in Little-Endian over 4 bytes, which makes the transaction look like this:

Validation Step 3
Figure 28. Append the hash type SIGHASH_ALL, or the blue part at the end.

The hash256 of this interpreted as a Big-Endian integer is our z. The code for getting our signature hash, or z, looks like this:

link:code-ch08/examples.py[role=include]

Now that we have our z, we can grab the SEC public key and DER signature from the ScriptSig and RedeemScript (DER and SEC within the p2sh ScriptSig and RedeemScript):

DER and SEC
Figure 29. DER and SEC within the p2sh ScriptSig and RedeemScript
link:code-ch08/examples.py[role=include]
  1. z is from the code above

We’ve verified 1 of the 2 signatures that are required to unlock this p2sh multisig.

Conclusion

We learned how p2sh ScriptPubKeys are created and how they’re redeemed. We’ve covered Transactions for the last 4 chapters, we now turn to how they are grouped in Blocks.