WIP: WIP-0011 Layer: Consensus (hard fork) Title: Improve consistency and availability of superblock voting protocol Authors: Adán SDPC Discussions-To: `#dev-general` channel on Witnet Community's Discord server Status: Final Type: Standards Track Created: 2021-04-06 License: BSD-2-Clause
This proposal follows-up on the countermeasures of WIP-0010 by implementing a minimum size for the superblock voting committee, so as to prevent nodes from irreversibly forking when falsely consolidating minoritarian superblocks during episodes of severe network disruption.
This also proposes a new algorithm for composing randomized committees in case of lack of consensus, so as to maximize the chances for sampling a committee that is in consensus, and reducing as much as possible the time needed for the chain to resume normal operation.
A superblock voting protocol exists in Withet that samples identities from the Active Reputation Set (ARS), and gives those identities the right to vote on what is the current chain tip.
The messages being signed when voting are called superblocks, contain a cryptographic commitment to the tip of the block chain and confirm the addition of blocks to the chain just like blocks in turn do with individual transactions.
Every 10 Witnet epochs (7.5 minutes, a.k.a. superepoch), all nodes in the network compute whether they are eligible for voting. If they are not, they simply collect votes cast by others, and check whether their local chain tip matches the chain tip with the most votes.
In case of a match, they consolidate their chain state and consider as final every block prior to that checkpoint. Conversely, in case of a mismatch, they roll their chain state back to the one they had as of the previous consolidated superblock.
A special lack of consensus case exists for the situation in which no chain tip gathers a supermajority of 2/3rds of the committee, either because the number of votes that were collected is less than 2/3rds of the size of the committee, or because there exist multiple chain tips out of which none received 2/3rds or more of the votes. In those cases, all nodes roll their chain state back to the previous consolidated superblock.
May that lack of consensus situation happen 5 superepochs in a row, the protocol will reduce by 5 the size of the committee of identities to be sampled from the ARS. That is, if the initial committee size was 100, it will become 95.
May the lack of consensus situation continue after the reduction, further committee size reductions in steps of 5 will be applied for every 5 superblocks in a row that lack consensus. This can be repeated as many times as needed, unless the committee size hits a minimum committee size.
Even if the committee size gets reduced, the same 2/3rds supermajority requirement applies. E.g. in the case of a committee of 95 identities, the amount of coincident votes needs to be equal or greater than 64; and for a committee of, say, 70 identities, 47 votes are needed.
Conversely, this mechanism also provides for a way to a grow the committee size back to its original size. Every time that 1 superblock is successfully consolidated, the committee size is increased by 5.
As per the current protocol, the initial and maximum committee size is 100, and the minimum committee size is 1.
Selection of the identities to be part of a committee is performed deterministically over a list of all the identities in the ARS, conveniently ordered by decreasing reputation score.
The aim is for each subsequent superblock to be voted by a different committee that ideally has the lesser overlap as possible with the one before, and picks identities across all the different reputation percentiles.
For any given superepoch index (the integer division by ten of the epoch of the first block in the superblock), a committee is sampled from the ARS according to this business logic:
- Take the first byte of the previous superblock SHA256 hash,
sbh[0]
. - Take the first byte of the SHA256 hash of the superepoch index,
sei[0]
. - Add those bytes together and wrap over the ARS size to compute the position of the first selected identity,
first_position = (sbh[0] + sei[0]) % ars_size
. - Calculate how many evenly spaced committees can fit in the ARS,
gap = ars_size \ committee_size
(\
standing for integer division). - Pick every identity whose position in the ARS matches the expression
(identity_position - first_position) % gap == 0
. In other words, pick the identities that occupy positions(first_position + n * gap) % ars_size
forn = [0, committee_size)
.
Recent network disruption episodes as the one described in WIP-0010 proved that having a minimum committee size of 1 makes it too easy for the network to irreversibly fork into multiple, incompatible chains that are impossible to reconcile without significant community coordination or resorting to some recovery mechanism that trades off decentralization for the sake of availability.
Incidentally, as the current minimum committe size (1) is not a multiple of the committee reducion and increasal step size (5), may the committee size ever hit the minimum and then grow back to higher sizes, the way up will follow a different progression than when descending (1, 6, 11, 16, ... vs. ..., 15, 10, 5, 1), because it will only descend by 4 when going from 5 to 1, thus causing the offset. The same happens when the committee size first comes near to 100 after ever hitting 1, as the last step will go from 96 to 100.
While it is true that the current committee selection algorithm outputs different committees for each different superepoch, and ensures that the committee varies also in case of lack of consensus, there are multiple concerns about its fairness and randomness.
Issue 1467 in witnet-rust tracks a known design flaw of this mechanism whereby the probability for any two given positions in the ARS to be selected in a committee are not necessarily equal because the addition of two random bytes being performed does not output a uniform but normal distribution. This effect is aggravated by modulo bias when wrapping the identities selection over the ARS size.
Additionaly, the way in which the modulo operator is currently applied onto the addition of sbh[0]
and sbi[0]
can also have a negative effect on the likelihood of coming up with a valid signing committee. In the context of a long lack of consensus episode in which many superepochs in a row achieve no consensus, sbh[0]
will remain unchanged, and subsequent values of sbi[0]
will simply cause the selected positions from the ARS to be shifted by 1, but never shuffled. After gap
superepochs without consensus, the committees will start to repeat, and with all likelihood, they will be unable to achieve consensus just as they were at first try.
- A minimum superblock voting committee size of 50 identities is proposed.
- The maximum superblock voting committee size remains at 100 identities.
- The reduction and increasal step size remains at 5 identities.
This minimum should vastly reduce the chances for multiple superblocks to be consolidated at once in case of network partition, while still giving the protocol adaptability to situations in which a significant part of the ARS members may accidentaly or purposefully fail to vote.
These are the expected probabilities and average times to consensus in the scenario in which a certain percentage of the ARS members fails to vote, before and after reducing the committee from 100 to 50:
Failure rate | Prob. @ 100 | Avg. time @ 100 | Prob. @ 50 | Avg. time @ 50 |
---|---|---|---|---|
<20% | ~1 | 15m | ~1 | 15m |
25% | 0.97241 | 15m 24 | 0.90169 | 16m 38s |
30% | 0.77926 | 19m 14s | 0.68387 | 21m 56s |
35% | 0.38029 | 39m 26s | 0.38886 | 38m 34s |
40% | 0.09125 | 2h 24m | 0.15609 | 1h 36m |
45% | 0.00976 | 25h 38m | 0.04265 | 5h 52m |
50% | 0.00044 | 23d 20h | 0.00767 | 32h 34m |
55% | 0.00001 | 3y 281d | 0.00087 | 12d |
As seen above, this change makes a huge difference when it comes to average time to consensus under severe disruption situations in which more than 1/3rd of the ARS members stopped voting at once. E.g. in the fatal event of 50% of the ARS members failing to vote, the time that it takes for the chain to be resumed is reduced from nearly 24 days down to 32 hours.
As the proposed superblock voting committee size (50) is a multiple of the reduction and increasal step size (5) the unintended and inconvenient asymmetry explained above simply disappears.
As a result, the possible committee sizes will exclusively be 100, 95, 90, 85, 80, 75, 70, 65, 60, 55, and 50.
A new algorithm is proposed for selecting the identities that will conform each superblock voting committee.
The aim of this improvement is to get rid of the poor randomness and fairness of the committees as explained above and to ensure that in the event of a long lack of consensus episode, new randomly sampled committees can be tried all the time, with the only limit of how many different samples of a certain size you can form out of the existing ARS.
The proposed algorithm goes as follows:
- Take the previous superblock SHA256 hash as an array of 32 bytes,
sbh
. - Serialize the superepoch index in big endian (MSB first) into an array of 32 bytes,
sei
. - Compute 32 pseudorandom bytes as
prb = SHA256(sbh | sei)
, (|
standing for concatenation). - Wrap the 256-bit big endian unsigned integer value of the pseudorandom bytes over the ARS size to find the position of the first selected identity,
first_position = prb % ars
. - Calculate how many evenly spaced committees can fit in the ARS,
gap = ars_size \ committee_size
(\
standing for integer division). - Pick the identities that occupy positions
(first_position + n * gap + (prb[n % 32] % gap)) % ars_size
forn = [0, committee_size)
.
- Implementer: a client that implements or adopts the protocol improvements proposed in this document.
- Non-implementer: a client that fails to or refuses to implement or adopt the protocol improvements proposed in this document.
Upon entry into force of the proposed improvements:
- Blocks and transactions that were considered valid by former versions of the protocol MUST continue to be considered valid.
- Blocks and transactions produced by non-implementers MUST be validated by implementers using the same rules as with those coming from implementers.
- Implementers MUST apply the proposed logic when evaluating superblock consensus.
As a result:
- Non-implementers MUST get their valid blocks accepted and included into superblocks by implementers.
- Implementers MUST get their valid blocks accepted by non-implementers, but they MAY NOT included into superblocks by non-implementers.
- Non-implementers MAY consolidate different superblocks and blocks than implementers.
Due to this last point, this MUST be considered a consensus-critical protocol improvement. An adoption plan MUST be proposed, setting an activation date that gives miners enough time in advance to upgrade their existing clients.
The protocol improvements proposed herein will affect any library or client implementing the superblock voting protocol.
Libraries, clients, interfaces, APIs, or any other software or hardware that do not implement or validate the superblock voting protocol will remain unaffected.
A reference implementation for the proposed protocol improvements can be found as an integration branch in the witnet-rust repository.
An activation date is proposed for April 28 2021 at 9am UTC, in expectation that the protocol improvements described herein jointly enter into force with the related WIP-0009 and WIP-0012.
This proposal has been cooperatively discussed and devised by many individuals from the Witnet development community.
Special thanks to:
- Luis Rubio and Tomasz Polaczyk for the reference implementation in Rust.
- Tomasz Polaczyk for running the numbers on Jupiter Notebook.