forked from alinush/6.824-lecture-notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
pbft-2012.txt
276 lines (232 loc) · 9.93 KB
/
pbft-2012.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
6.824 2012 Lecture 17: Security: Byzantine Fault Tolerance
reminder: start thinking about projects, groups of 2/3
we've considered many fault-tolerance protocols
have always assumed "fail-stop" failures -- like power failure
i.e. servers follow the protocol
hard enough: crash vs network down; network partition
can one handle a larger class of failures?
buggy servers, that compute incorrectly rather than stopping?
servers that *don't* follow the protocol?
servers that have been modified by an attacker?
often called "Byzantine" faults
the paper's approach:
replicated state machine
assumes 2f+1 of 3f+1 are non-faulty
use voting to select the right results
not as easy as it might sound
let's assume the worst case:
a single attacker controls the f faulty replicas
and is actively trying to break the system
if we can handle this, we can handle bugs in f replicas too
what are the attacker's powers?
supplies the code that faulty replicas run
knows the code the non-faulty replicas are running
knows the faulty replicas' crypto keys
can read network messages
can temporarily force messages to be delayed via DoS
what faults *can't* happen?
no more than f out of 3f+1 replicas can be faulty
no client failure -- clients never do anything bad
no guessing of crypto keys or breaking of cryptography
example use scenario:
RM:
echo A > grade
echo B > grade
tell YM "the grade file is ready"
YM:
cat grade
a faulty system could:
totally make up the file contents
execute write("A") but ignore write("B")
show "B" to RM and "A" to YM
execute write("B") only only some of the replicas
let's try to design our own byzantine-fault-tolerant RSM
start simple (and broken), work towards paper's design
design 1:
[client, n servers]
n servers
client sends request to all of them
waits for all n to reply
only proceeds if all n agree
what's wrong with design 1?
one faulty replica can stop progress by disagreeing
design 2:
let's have replicas vote
2f+1 servers, assume no more than f are faulty
client waits for f+1 matching replies
if only f are faulty, and network works eventually, must get them!
what's wrong with design 2's 2f+1?
f+1 matching replies might be f bad nodes and just 1 good
so maybe only one good node got the operation!
*next* operation also waits for f+1
might *not* include that one good node that saw op1
example:
S1 S2 S3 (S1 is bad)
everyone hears and replies to write("A")
S1 and S2 reply to write("B"), but S3 misses it
client can't wait for S3 since it may be the one faulty server
S1 and S3 reply to read(), but S2 misses it
so read() yields "A"
result: client tricked into accepting a reply based on out-of-date state
e.g. TA reads A instead of B from grades file
design 3:
3f+1 servers, of which at most f are faulty
client waits for 2f+1 matching replies
== f bad nodes plus a majority of the good nodes
so all sets of 2f+1 overlap in at least one good node
example:
S1 S2 S3 S4 (S1 is bad)
everyone hears write("A")
S1, S2, S3 process write("B"), S4 misses it
now the read()
client will wait for 2f+1=3 matching replies
S1 and S4 will reply "A"
S2 and S3 will reply "B"
client doesn't know what to believe (neither is 2f+1)
but it is guaranteed to see there's a problem
so client can *detect* that some good nodes missed an operation
we'll see how to repair in a bit
what about handling multiple clients?
non-faulty replicas must process operations in the same order!
let's have a primary to pick order for concurrent client requests
but we have to worry about a faulty primary
what can a faulty primary do?
1. send wrong result to client
2. different ops to different replicas
3. ignore a client op
general approach to handling faulty primary
1. replicas send results direct to client
2. replicas exchange info about ops sent by primary
3. clients notify replicas of each operation, as well as primary
each replica watches progress of each operation
if no progress, force change of primary
can a replica execute an operation when it first receives it from primary?
no: maybe primary gave different ops to different replicas
if we execute before we're sure, we've wrecked the replica's state
need 2nd round of messages to make sure all good replicas got the same op
design 4:
3f+1 servers, one is primary, f faulty, primary might be faulty
client sends request to primary AND to each replica
primary chooses next op and op #
primary sends PRE-PREPARE(op, n) to replicas
each replica sends PREPARE(op, n) to all replicas
if replica gets matching PREPARE(op, n) from 2f+1 replicas (incl itself)
and n is the next operation #
execute the operation, possibly modifying state
send reply to client
else:
keep waiting
client is happy when it gets f+1 matching replies
REQ PRE-P PREPARE REPLY
C
0
1
2
3
remember our strategy:
primary follows protocol => progress
no progress => replicas detect and force change of primary
if the primary is non-faulty, can faulty replicas prevent correct progress?
they can't forge primary msgs
they can delay msgs, but not forever
they can do nothing: but they aren't needed for 2f+1 matching PREPAREs
they can send correct PREPAREs
and DoS f good replicas to prevent them from hearing ops
but those replicas will eventually hear the ops from the primary
worst outcome: delays
if the primary is faulty, will replicas detect any problem?
or can primary cause undetectable problem?
primary can't forge client ops -- signed
it can't ignore client ops -- client sends to all replicas
it can try to send in different order to different replicas,
or try to trick replicas into thinking an op has been
processed even though it hasn't
will replicas detect such an attack?
results of the primary sending diff ops to diff replicas?
case 1: all good nodes get 2f+1 matching PREPAREs
did they all get the same op?
yes: everyone who got 2f+1 matching PREPAREs must have gotten same op
since any two sets of 2f+1 share at least one good server
result: all good nodes will execute op2, client happy
case 2: >= f+1 good nodes get 2f+1 matching PREPARES
again, no disagreement possible
result: f+1 good nodes will execute op, client happy
BUT up to f good nodes don't execute
can they be used to effectively roll back the op?
i.e. send the write("B") to f+1, send read() to remaining f
no: won't be able to find 2f+1 replicas with old state
so no enough PREPAREs
case 3: < f+1 good nodes get 2f+1 matching PREPAREs
result: client never gets a reply
result: system will stop, since f+1 stuck waiting for this op
how to resume operation after faulty primary?
need a view change to choose new primary
(this view change only chooses primary; no notion of set of live servers)
when does a replica ask for a view change?
if it sees a client op but doesn't see 2f+1 matching PREPAREs
after some timeout period
is it OK to trigger a view change if just one replica asks?
no: faulty replicas might cause constant view changes
let's defer the question of how many replicas must ask for
a view change
who is the next primary?
need to make sure faulty replicas can't always make themselves next primary
view number v
primary is v mod n
so primary rotates among servers
at most f faulty primaries in a row
view change design 1 (not correct)
replicas send VIEW-CHANGE requests to *new* primary
new primary waits for enough view-change requests
new primary announces view change w/ NEW-VIEW
includes the VIEW-CHANGE requests
as proof that enough replicas wanted to change views
new primary starts numbering operations at last n it saw + 1
will all non-faulty replicas agree about operation numbering across view change?
problem:
I saw 2f+1 PREPAREs for operation n, so I executed it
new primary did not, so it did not execute it
thus new primary may start numbering at n, yielding two different op #n
can new primary ask all replicas for set of operations they have executed?
doesn't work: new primary can only wait for 2f+1 replies
faulty replicas may reply, so new primary may not wait for me
solution:
don't execute operation until sure a new primary will hear about it
add a third phase: PRE-PREPARE, PREPARE, then COMMIT
only execute after commit
operation protocol:
client sends op to primary
primary sends PRE-PREPARE(op, n) to all
all send PREPARE(op, n) to all
after replica receives 2f+1 matching PREPARE(op, n)
send COMMIT(op, n) to all
after receiving 2f+1 matching COMMIT(op, n)
execute op
view change:
each replica sends new primary 2f+1 PREPAREs for recent ops
new primary waits for 2f+1 VIEW-CHANGE requests
new primary sends NEW-VIEW msg to all replicas with
complete set of VIEW-CHANGE msgs
list of every op for which some VIEW-CHANGE contained 2f+1 PREPAREs
i.e. list of final ops from last view
if a replica executes an op, will new primary will know of that op?
replica only executed after receiving 2f+1 COMMITS
maybe f of those were lies, from faulty replicas, who won't tell new primary
but f+1 COMMITs were from replicas that got 2f+1 matching PREPAREs
new primary waits for view-change requests from 2f+1 replicas
ignoring the f faulty nodes
f+1 sent COMMITs, f+1 sent VIEW-CHANGE
must overlap
can the new primary omit some of the reported recent operations?
no, NEW-VIEW must include signed VIEW-CHANGE messages
paper also discusses
checkpoints and logs to help good nodes recover
various cryptographic optimizations
optimizations to reduce # of msgs in common case
fast read-only operations
what are the consequences of more than f corrupt servers?
can the system recover?
what if the client is corrupt?
suppose an attacker can corrupt one of the servers
exploits a bug, or steals a password, or has physical access, &c
why can't the attacker corrupt them all?