-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opensearch 2.16.0 breaks alerts #20119
Comments
Greetings! Can you expand a bit on this?
Can you clarify what "high message count" means? Does this mean the search query for the event definition returned an usually high number of messages? OR is this specifically the event notification? Can you provide a screenshot to help clarify?
Can you provide a screenshot? To clarify, this is when you click the replay search URL in the notificaiton or via the alerts screen in graylog?
This is expected unfortunately. Typically new OpenSearch versions introduce a new Lucene version. OpenSearch 2.16 updated lucene from 9.10 to 9.11.1. Unfortunately it is impossible to downgrade the lucene version of an OpenSearch cluster. Can you share your Thanks! |
Can you confirm the stream IDs match between the id in
Good to hear. Are you confident OpenSearch 2.16.0 is working as expected and Graylog is working as expected? |
I think we're seeing the same issue after an upgrade. I tried a quick grep for "Removing non-existing" in server.log, but didn't see anything. For example, one alert has a condition of "count > 600", and a replay of the search shows a count 14. But the alert is triggered due to the claimed count of "950013". Some filter that's not set correctly anymore? We seem to have upgraded both opensearch from 2.14.0 to 2.16.0 and graylog from 6.0.2 to 6.0.5 today. |
I think you misunderstood. The lines in the server.log were related to old streams I was no longer interested in. So I deleted those streams, and those specific lines in server.log have stopped. But those had no relation to the problem that I was seeing. I posted those lines because you asked for things in server.log, not because it had a connection to the issue. Those "removing non-existing" lines were caused by some alert was referring to a stream that was already deleted. What @dhedberg is saying sounds the same as the problem we're having. What I didn't mention before is that I also had this problem on graylog 6.0.4. This version was running on the day I found the issue. I upgraded to 6.0.5 to see if it would resolve the problem, but it didn't. |
@fjl82 thank you for clarifying. Do you feel comfortable sharing your event definition that is causing thing? If possible exporting it to a content pack? My goal is try to and understand how to reproduce the issue. If i may: a summary of the issue is that your event criteria is not behaving as expected, the resulting query returns much more data than you expect (you expect 0). The replay search (or running the search query directly) show different results than the event. Is this correct? |
Without having made any effort to understand the code and queries involved I took a quick look at the opensearch issue tracker. Might opensearch-project/OpenSearch#15169 be related? Just based on the fact that it apparently broke in 2.16.0 and involves a query being ignored. |
@dhedberg potentially, yes. We were looking into that very issue to see if it could be the root cause here, but we were unable to recreate the issue in our test environments running OpenSeach 2.16.0. We were hoping to get more information about any event definitions that were causing the problem so we can reliably reproduce the issue and figure out if it is on our end or due to that (or another) OpenSearch issue. |
I can confirm that the OS queries we are generating in alerting do not return proper results against 2.16.0. All filters are ignored due to the usage of the |
There seems to be a workaround: opensearch-project/OpenSearch#15169 (comment) |
Thanks @bernd, I applied this setting and alerts seem to work ok again now. |
I can confirm that the workaround mentioned by @bernd is working for me, thanks! |
We have published an advisory regarding this issue that includes the work-around here https://graylog.org/post/alert-notice-opensearch-v2-16/ |
Last night, Opensearch got upgraded from 2.15.0 to 2.16.0. Nothing else was changed. After this, alerts started coming in with a high message count but no messages listed in the email (normally max 3 are included). Using the search replay shows no messages either. It seems to apply to all configured alerts. Normal message searches also work fine.
Trying to downgrade Opensearch fails. On startup it stops with an error:
java.lang.IllegalStateException: cannot downgrade a node from version [2.16.0] to version [2.15.0]
If you need me to check anything, or need more info, let me know.
Expected Behavior
Alerts should behave as before on 2.15.0.
Current Behavior
Alerts keep triggering with an ever rising message count
Possible Solution
Support opensearch 2.16.0
Steps to Reproduce (for bugs)
Context
Alerts are currently unusable. This is the feature in Graylog we use most (alerting us to application issues).
Your Environment
The text was updated successfully, but these errors were encountered: