-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit tracking custom errors (e.g. from LUA) while allowing non custom errors to be tracked normally #500
Conversation
(Force pushed because this was recommended by the bot here since I did not include the commit sign off originally. Also because it is not reviewed yet) |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #500 +/- ##
============================================
+ Coverage 70.20% 70.25% +0.04%
============================================
Files 111 112 +1
Lines 60242 60587 +345
============================================
+ Hits 42295 42567 +272
- Misses 17947 18020 +73
|
@KarthikSubbarao Overall LGTM. I do think we need to add more tests so that future changes will not introduce degradation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. My comments are minor.
@ranshid - When functions are used and contain LUA code with server.error_reply with custom error messages, they will still be caught by this new logic when we are past the 128 limit. Did you mean add test cases where functions (with lua) are tracked under Example:
|
@KarthikSubbarao I think this is somewhat problematic. Functions are more like modules IMO and I think we should allow function errors to overflow. I think maybe we can flag the client (or check if we are in the context of eval/evalsha) in order to enforce the overflow? |
Functions are still using LUA and are using the same APIs such as To handle this, when a server exceeds the limit, when functions are used with additional custom errors, they are tracked under IMO, this behavior makes sense - but I am also curious to hear from others |
…es while allowing non LUA errors to function as usual Signed-off-by: Karthik Subbarao <[email protected]>
Signed-off-by: Karthik Subbarao <[email protected]>
Signed-off-by: Karthik Subbarao <[email protected]>
Signed-off-by: Karthik Subbarao <[email protected]>
Sorry for the force push. I did not include the sign off on the previous commit and the DCO check required this |
Signed-off-by: KarthikSubbarao <[email protected]>
Signed-off-by: KarthikSubbarao <[email protected]>
Signed-off-by: KarthikSubbarao <[email protected]>
@valkey-io/core-team Since @PingXie mentioned in the other thread about this, I thought it would be good to try to get consensus on this change. The tl;dr is that in Redis it was possible to spam the error_stat log with custom errors generated from lua scripts, because you can return arbitrary errors from lua scripts. @enjoy-binbin implemented a work around that would clear the errostat radix tree if it's size got above a certain threshold. In order for a user to resume seeing errors, they would need to to call Ping suggested:
I think this is important to get right for 8, so let's settle this here if possible. |
I got convinced by you on |
Signed-off-by: KarthikSubbarao <[email protected]>
From our meeting this morning, we are directionally approved to update the behavior here but we'll decide locally in the PR for whatever makes sense. |
Signed-off-by: KarthikSubbarao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A few minor suggestions/questions.
Sorry for the late consensus binding, binbin mentioned in the other thread that he was good with this approach and I just had a small comment ontop of Viktors. Once they are addressed this should be good to merge. |
This reverts commit 05e946f. Signed-off-by: KarthikSubbarao <[email protected]>
… + update overflow error prefix name Signed-off-by: KarthikSubbarao <[email protected]>
Signed-off-by: KarthikSubbarao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nit, then I'm happy.
Signed-off-by: KarthikSubbarao <[email protected]>
This run might have failed on a flaky test: https://github.com/valkey-io/valkey/actions/runs/9868728468/job/27251202927?pr=500 I re-ran it locally multiple times and the tests all passed |
Signed-off-by: KarthikSubbarao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks.
Signed-off-by: Madelyn Olson <[email protected]>
Implementing the change proposed here: #487
In this PR, we prevent tracking new custom error messages (e.g. LUA) if the number of error messages (in the errors RAX) is greater than 128. Instead, we will track any additional custom error prefix in a new counter:
errorstat_ERRORSTATS_OVERFLOW
and if any non-custom flagged errors (e.g. MOVED / CLUSTERDOWN) occur, they will continue to be tracked as usual.This will address the issue of spammed error messages / memory usage of the errors RAX. Additionally, we will not have to execute
CONFIG RESETSTAT
to restore error stats functionality because normal error messages continue to be tracked.Example: