-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raft pre-vote extension implementation #530
Changes from 79 commits
987cfd2
f1ed619
c57bed8
3082935
2c44c94
baf623c
22b88e4
0882d3b
14dfa68
a617c6d
60f3232
16eebcd
3e9efaf
8b56ef3
2e8a26a
01b88ad
a7aeffa
8ade0de
164e0c9
6b0d70d
89da642
7cde48c
9b03730
0c3c507
b3dec1b
02e042f
a359f5c
e0c539d
cfa7e87
2dad2f3
2b09f8c
785b127
63a9333
0b1811d
1b18a66
ea82333
8786e2a
16a8c76
ae369f3
4c678d8
f83d298
fafbf06
80e46f0
bb62335
9ed608a
4dcdf1d
475d620
2bad938
b3970e0
0a2e3b5
e926022
74e7f53
afca282
34b71eb
2988d07
e9fcf37
07bd0e2
95e17a0
104ceaa
faa005e
d15db63
e7e4476
6065328
257ba38
12a2bf1
3c6af46
989ca74
eba1435
348e418
f4e3442
6744aa7
89a5e17
b27f8dd
1d84a92
e8a78ba
62bcfe7
97f6bd4
d4cb80f
4e3bb65
9ce972d
36690e5
43974d2
7543a3d
de78bf8
921be2d
4f6fc13
1912201
27a2e7d
ddcbc50
083b811
d962c24
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -92,6 +92,7 @@ type RequestVoteRequest struct { | |||||||
// transfer. It is required for leadership transfer to work, because servers | ||||||||
// wouldn't vote otherwise if they are aware of an existing leader. | ||||||||
LeadershipTransfer bool | ||||||||
PreVote bool | ||||||||
} | ||||||||
|
||||||||
// GetRPCHeader - See WithRPCHeader. | ||||||||
|
@@ -113,13 +114,63 @@ type RequestVoteResponse struct { | |||||||
|
||||||||
// Is the vote granted. | ||||||||
Granted bool | ||||||||
|
||||||||
// Is it a preVote response | ||||||||
PreVote bool | ||||||||
} | ||||||||
|
||||||||
// GetRPCHeader - See WithRPCHeader. | ||||||||
func (r *RequestVoteResponse) GetRPCHeader() RPCHeader { | ||||||||
return r.RPCHeader | ||||||||
} | ||||||||
|
||||||||
// RequestPreVoteRequest is the command used by a candidate to ask a Raft peer | ||||||||
// for a vote in an election. | ||||||||
type RequestPreVoteRequest struct { | ||||||||
RPCHeader | ||||||||
|
||||||||
// Provide the term and our id | ||||||||
Term uint64 | ||||||||
|
||||||||
// Deprecated: use RPCHeader.Addr instead | ||||||||
Candidate []byte | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Could we remove this field. Seeing as its been deprecated for the regular RequestVoteRequest it seems reasonable for pre-vote to just never support this way of specifying the candidate server. Also in processing pre vote requests we never use this field. |
||||||||
|
||||||||
// Used to ensure safety | ||||||||
LastLogIndex uint64 | ||||||||
LastLogTerm uint64 | ||||||||
|
||||||||
// Used to indicate to peers if this vote was triggered by a leadership | ||||||||
// transfer. It is required for leadership transfer to work, because servers | ||||||||
// wouldn't vote otherwise if they are aware of an existing leader. | ||||||||
LeadershipTransfer bool | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do leadership transfers utilize pre-vote? I think they should probably avoid pre-voting. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes it does. I traced leadership transfer and it work as follow:
From this point normal leadership election will happen (including pre-vote) the only exception is that we send the LeadershipTransfer as part of the vote (and pre-vote) to avoid not granting the vote/pre-vote when the voting node think it still have a leader. I'm mitigated about if we should keep it as is or completely remove pre-vote from the equation when performing leadership transfer I think it won't change anything about the behaviour in leadership transfer as in all cases we will end up in leader-election. I would be interested in hearing your thoughts and @banks about this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🤔 My initial thought here is that we should probably skip pre-vote for a leadership transfer to keep it easier to reason about:
I've not looked to see how hard that is in the code though. But I'd be tempted rather than adding this flag here to instead just avoid sending prevote round at all when responding to a TimeoutNow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is not necessarily true as we allow providing the server we would like to transfer leadership to and it's a best effort transfer, so if the server we specify is not able to become a leader the transfer won't happen. That said, I agree on the fact that leadership transfer is probably not something that we will benefit from optimizing the election for and removing it from the pre-vote flow remove some of the complexity. |
||||||||
} | ||||||||
|
||||||||
// GetRPCHeader - See WithRPCHeader. | ||||||||
func (r *RequestPreVoteRequest) GetRPCHeader() RPCHeader { | ||||||||
return r.RPCHeader | ||||||||
} | ||||||||
|
||||||||
// RequestPreVoteResponse is the response returned from a RequestPreVoteRequest. | ||||||||
type RequestPreVoteResponse struct { | ||||||||
RPCHeader | ||||||||
|
||||||||
// Newer term if leader is out of date. | ||||||||
Term uint64 | ||||||||
|
||||||||
// Peers is deprecated, but required by servers that only understand | ||||||||
// protocol version 0. This is not populated in protocol version 2 | ||||||||
// and later. | ||||||||
Peers []byte | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd maybe lean towards taking this out. I can understand that it might be simpler initially to have identical code in Vote and PreVote so I might be persuaded, but it seems odd that we'd add in legacy stuff that will never be used and will need to be cleaned up again in the future. |
||||||||
|
||||||||
// Is the vote granted. | ||||||||
Granted bool | ||||||||
} | ||||||||
|
||||||||
// GetRPCHeader - See WithRPCHeader. | ||||||||
func (r *RequestPreVoteResponse) GetRPCHeader() RPCHeader { | ||||||||
return r.RPCHeader | ||||||||
} | ||||||||
|
||||||||
// InstallSnapshotRequest is the command sent to a Raft peer to bootstrap its | ||||||||
// log (and state machine) from a snapshot on another peer. | ||||||||
type InstallSnapshotRequest struct { | ||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -234,6 +234,9 @@ type Config struct { | |
|
||
// skipStartup allows NewRaft() to bypass all background work goroutines | ||
skipStartup bool | ||
|
||
// PreVote activate the pre-vote feature | ||
PreVote bool | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking aloud: I wonder if it would be useful to make this Reloadable too 🤔 . It would be easy enough with an atomic flag I think, just thinking through that might make it possible say for Autopilot or something else to flip it on automatically once we've verified all voters are at the right version? I know this PR should mean that's not necessary but just thinking through why we'd ever disable it if it wasn't for compatibility and would we then want to be able to flip that at runtime? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh and a minor thing, but if we keep this, In my mind it should probably go above the private There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I also wonder if this should be opt-out rather than opt-in? It's a pretty major defect of this implementation that it can be disrupted by network partitions so provided we're confident it will be correct through upgrades (which I think we need to be to merge it at all) then it would be better if everyone gets it by default. We should probably cut a SemVer minor release and note the change in behavior and that you can opt out. Also see my comment on api.go about auto-disabling it if the transport doesn't actually support it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree about making this an opt-out flag, I'm not sure about how useful will it be to be a dynamic flag. Especially considering the impact related to toggling it in the middle of operations or while the cluster is unstable 🤔 |
||
} | ||
|
||
func (conf *Config) getOrCreateLogger() hclog.Logger { | ||
|
@@ -317,7 +320,7 @@ func DefaultConfig() *Config { | |
SnapshotInterval: 120 * time.Second, | ||
SnapshotThreshold: 8192, | ||
LeaderLeaseTimeout: 500 * time.Millisecond, | ||
LogLevel: "DEBUG", | ||
LogLevel: "TRACE", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you mean to commit this? I'm not sure that changing default log level to TRACE makes sense for most users? |
||
} | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -73,7 +73,8 @@ func (r *RaftEnv) Restart(t *testing.T) { | |
func MakeRaft(tb testing.TB, conf *Config, bootstrap bool) *RaftEnv { | ||
// Set the config | ||
if conf == nil { | ||
conf = inmemConfig(tb) | ||
t := tb.(*testing.T) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be better to modify the inmemConfig method to consume a testing.TB instead of a *testing.T? |
||
conf = inmemConfig(t) | ||
} | ||
|
||
dir, err := os.MkdirTemp("", "raft") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check that
transport
actually supports prevote here and auto-disable it if not.For example if a product like Consul has a custom transport that wraps NetTransport and they don't realise they need to implement the optional interface, then when they upgrade they will end up silently trying to prevote but not actually sending messages and then timing out pointlessly.