-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable cluster blacklist TTL #738
Configurable cluster blacklist TTL #738
Conversation
Also you would need to signoff https://github.com/valkey-io/valkey/runs/26959011274 |
@zuiderkwast We discussed this briefly during the summit. PTAL. |
c3d8a9e
to
a4ce40b
Compare
Hi @BCathcart! Have you experienced this problem after we added blacklist propagation in the cluster bus? |
Hey @zuiderkwast. Not exactly. We piggy-backed some additional functionality internally on top of the blacklist, which was the real motivation for making it configurable. It sounds like it would be a nice-to-have in OSS as well but maybe the blacklist propagation addresses this enough already. |
3a68fe9
to
0a613f0
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #738 +/- ##
============================================
- Coverage 70.23% 70.20% -0.04%
============================================
Files 112 112
Lines 60602 60602
============================================
- Hits 42566 42544 -22
- Misses 18036 18058 +22
|
Interesting. What was that? If it's not a problem anyone is experiencing, I don't think we need to solve it. :) In general, I think we should avoid adding configs. If this is actually a problem in a large cluster, could we let the value depend on parameters such as the size of the cluster and the node timeout instead? It's possible to estimate how long it takes for the blacklist to be propagated to all nodes in a cluster. |
It was a quick fix to help address a cluster message formatting mismatch between versions. It's very irrelevant to OSS so I won't get into the details. I'm not feeling a strong need to make this configurable either now that I know about the blacklist propagation. Scaling the ttl based on cluster size and existing parameters seems reasonable to me, but I have no evidence the current 60 seconds is ever not enough time for the blacklist propagation in practice, so I don't think we need to jump into anything for now. |
@zuiderkwast In case of partition, the propagation might not arrive within a hard coded time period. I think making the parameter flexible for operator makes life easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, we had the same configuration item in ours fork, so i think it is ok in my head.
In some cases, the node may not forget the other node in time (like partition or admin tools bug), and we don't it to add it back (at a configurable time).
I noticed in #738 that we don't properly check ULong config boundaries and made the change there. I'm pulling out that particular commit into this PR since we don't know if we want to merge the configurable cluster blacklist TTL yet. --------- Signed-off-by: Brennan Cathcart <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
Oh, we've had this problem long before we were doing the piggy-backing :). I think it's especially bad in large clusters. Our internal control plane folks have asked for this feature for awhile.
This would be more ideal, but I'm not sure we'll ever be able to get it right. I suppose my opinion about configs is a bit more nuanced, which is that I am fine with lots of configs that most people will never need to touch, but are nice to have in case they need to touch them. I would also prefer to limit the number of configs on the datapath, but I really don't mind administration configs as much. |
I'm fine with adding the config. I just wanted to know that it's useful before we add it. Just to be clear: If a user changes the config to one hour, does CLUSTER FORGET and afterwards changes the config back to one minute, then the timestamp in the blacklist will be one hour. The user can't add the node again during this time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just added some wording updates.
@valkey-io/core-team Not for valkey 8.0, but please vote if we want to add this config to future versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BCathcart 👍
@madolson we won't even add config changes to 8.0 further? |
2be463b
to
84d4b95
Compare
The commits from my quick edits on github weren't automatically signed off so I had to do a little rebase. Hopefully it's good to go now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code LGTM.
Signed-off-by: Brennan Cathcart <[email protected]>
Signed-off-by: Brennan Cathcart <[email protected]>
Signed-off-by: Brennan Cathcart <[email protected]>
Signed-off-by: Brennan Cathcart <[email protected]>
Update valkey.conf description Signed-off-by: Brennan Cathcart <[email protected]>
84d4b95
to
2d1bbda
Compare
It seems ready to merge. I can see 5 core team members approving or upvoting. If we merge it before we cut the release branch, I see no reason why it can't be included in 8.0. |
Yeah, I guess it's before the cutoff so we can add it. |
I noticed in valkey-io#738 that we don't properly check ULong config boundaries and made the change there. I'm pulling out that particular commit into this PR since we don't know if we want to merge the configurable cluster blacklist TTL yet. --------- Signed-off-by: Brennan Cathcart <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
Allows cluster admins to configure the blacklist TTL as needed to allow sufficient time for `CLUSTER FORGET` to be executed on every node in the cluster. Config name `cluster-blacklist-ttl`; unit seconds; deault 60. --------- Signed-off-by: Brennan Cathcart <[email protected]>
Allows cluster admins to configure the blacklist TTL as needed to allow sufficient time for
CLUSTER FORGET
to be executed on every node in the cluster.Config name
cluster-blacklist-ttl
; unit seconds; deault 60.