-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEW] Handle spamming of custom LUA error messages in the INFO ERRORSTATS section with continued tracking of non LUA errors stats #487
Comments
does this redis/redis#13141 can somehow solve your problem? |
I really don't understand why Oran marked that as not a breaking change, it's possible someone had more than 128 errors. I feel like he was just trying to merge stuff before the license change. I think this should solve the issue @KarthikSubbarao brought up though. |
yes, that is indeed a potential breaking change. I have thought about add a new configuration item and defaulting to infinity to avoid the breaking change, but Oran does't seem to mention it and probable doesn't want to add new a configuration item for this.
Looking at it now, it may indeed be. |
@enjoy-binbin - (Just read the top comment) Yes, this will solve the issue of spamming For completeness - with redis/redis#13141, Whereas, the idea proposed in this issue was to have a config to disable custom LUA error messages while continuing to allow regular Error stats to function normally. But the main issue we wanted to address is misuse (spamming), so this should work. Thank you
If we think this is useful, we can consider making this improvement - by adding onto the existing logic |
Resetting all the error count will also lose visibility into other errors like MOVED etc that can help with debugging issues in the cluster. Is it possible to record all custom errors from lua in a separate RAX (based on |
Yes, looks like with the current implementation (from redis/redis#13141):
These issues aside, the PR addresses the concerns we have perfectly.
The proposal in this issue is to prevent tracking new error messages from LUA if the number of LUA error messages is greater than 128. Instead, we will track any additional LUA error type in a new counter: This will address the issue of spammed error messages / memory usage of the |
I like the idea of tracking lua errors in a separate RAX, however it can't just be based off of |
@KarthikSubbarao Thank you for driving this! I am reading through the PR and will provide technical comments there (if I have any) but wanted to ask here: |
@ranshid Thank you. Modules are similar in that they can result in custom error messages - but I would place Modules closer to the Valkey server in that it is the responsibility of those running the server to ensure that the modules they load do not lead to spamming the But (if required) we can indeed add support for preventing misuse by Modules by passing a flag in the |
I guess that depends on the definition of "lua errors". If they are interpreted as "custom errors" then agreed but do we see value in interpreting them as the origin of the errors? Meaning any errors, both custom and normal, generated by any scripts go to its own RAX. If we go down this path, we would have to create a dedicated errorstat section just for scripts ( |
I don't like this approach. I don't think we should have a special section just for scripts, since we try to treat the scripts as more or less normal clients. It also makes this definitely a break changing for users that are using this feature, as opposed to just a breaking change for users that are misusing the feature. |
Good point.
Looks like https://github.com/valkey-io/valkey/blob/unstable/src/script_lua.c#L620. We could pass a flag to it to differentiate custom lua errors? |
Yes. They could be emulating a normal error though. |
I wouldn't consider these "emulated" errors "normal" anymore ... I think it is fine to lump them all into "custom" errors, as far as storage is concerned so we don't lose other more important errors. |
Yes, I didn't realize that config resetstat is an admin command and may not be easy to call. I think the difficulty here is how many error codes we can allow the script to customize. The scenes I have seen so far are all misuse, maybe there is a case that user do need to use a lot error code. One way is that we set ERROR_STATS_NUMBER as a configuration item? btw, what PR #500 does is that when it encounters a miuse, it ensures that other normal errors can be counted instead of being disabled. I guess i am ok with that, if there is no miuse, everyone will be happy, if there is a miuse, at least you will want to check the errorstats. People still need to call CONFIG RESETSTATS to clear the miuse errorstats. |
I've only seen on the order of like 10-15 custom error codes, I think you're right that this is likely only a misuse. I suppose my concern is that I would like the "normal" errors to continue working for debug purposes. I'm kind of okay with the tradeoff that #500 proposes, we continue to report normal errors but we stop reporting LUA errors. @enjoy-binbin Just to understand your PoV, are you okay with #500, I'll try to followup and try to get it ready to merge this week if that is the case. |
yean, i am ok with that. |
…m errors to be tracked normally (#500) Implementing the change proposed here: #487 In this PR, we prevent tracking new custom error messages (e.g. LUA) if the number of error messages (in the errors RAX) is greater than 128. Instead, we will track any additional custom error prefix in a new counter: `errorstat_ERRORSTATS_OVERFLOW ` and if any non-custom flagged errors (e.g. MOVED / CLUSTERDOWN) occur, they will continue to be tracked as usual. This will address the issue of spammed error messages / memory usage of the errors RAX. Additionally, we will not have to execute `CONFIG RESETSTAT` to restore error stats functionality because normal error messages continue to be tracked. Example: ``` # Errorstats . . . errorstat_127:count=2 errorstat_128:count=2 errorstat_ERR:count=1 errorstat_ERRORSTATS_OVERFLOW:count=2 ``` --------- Signed-off-by: Karthik Subbarao <[email protected]> Signed-off-by: Madelyn Olson <[email protected]>
The problem/use-case that the feature addresses
When using LUA to reply with custom error messages, the prefix of the messages be included in the
INFO ERRORSTATS
section as a unique entry. This can spam theINFO ALL
command or anyINFO
command variant that includes theERRORSTATS
section.redis/redis#13141 address this spamming of the
INFO
command'sERRORSTATS
section and the memory usage of theerrors
RAX. However, it has the following drawbacks:CONFIG RESETSTAT
which will clear command stat counters and error stat counters. So, we will lose visibility in both of these.Description of the feature
The proposal in this issue is to prevent tracking new error messages from LUA if the number of LUA error messages is greater than 128. Instead, we will track any additional LUA error type in a new counter:
errorstat_LUA
and if a non-LUA error (e.g. MOVED / CLUSTERDOWN) occurs, they will continue to be tracked as usual.This will address the issue of spammed error messages / memory usage of the errors RAX. Additionally, we will not have to execute
CONFIG RESETSTAT
to restore functionality - as normal error messages continue to be tracked.Based on feedback and thoughts from this discussion, I can propose a PR to support this feature.
Alternatives you've considered
NA
The text was updated successfully, but these errors were encountered: