-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Elasticsearch]: Ingest pipeline created to process Elasticserver logs truncates log messages #12501
Comments
I was able to trace the ingest pipeline back to the addition of the parsing of index names to this commit... in 2018!! The pipeline evolved over the years from handling raw text logs to structured JSON logs and was ported from the Whether this qualifies as a bug or not is up for debate. Obviously, we're now parsing content in square brackets as index names, which they are not always. We're currently not sure how to handle this without breaking all the known (and unknown, read "customer") downstream processes that might build upon the |
This seems like a bug to me. Whether or not users are "relying" on the bracket extraction, it's most likely wrong, so the values they are getting are not index names. For example, consider the following message:
Will the first or second set of brackets be used? In either case, this is not an index name, so anything a user is doing with it is wrong, or not what they really want. Additioanlly, if the second set of brackets is used, the message will be effectively truncated to an empty string. That isn't useful to users in using our logs for debugging Elasticsearch, nor to ES engineers in trying to understand what is going on. |
Thanks @rjernst |
In non-json logs it is a thing. Even today, when we log something specific to an index, the index name gets placed in brackets (similar to how we have the node name in brackets). But it is context dependent, and I'm not sure it is done consistently (it requires getting a logger specific to that index). It also is not output, afaict, in json logs (it's not part of the message, so unless a dev explicitly puts the index name in the message, it wont' be seen there). Does ECS have a field for the index name? |
Indeed, that grok pattern was created back then when JSON logs didn't exist, so that might explain.
Yes, through a
In standard ECS fields, there isn't as far as I can tell. The Are you trying to see if for JSON logs |
Just want to clarify one thing, the specific issue we are currently seeing is that it only replaces the things in the [] only if the message field begins with it. so for example if the message is the following
then For the message that @rjernst provided
there will be no truncation and the message will be displayed as it is. This is the behavior I saw in my testing. This is still an issue that we want to be fixed. |
So ideally, we could just remove the |
Integration Name
Elasticsearch [elasticsearch]
Dataset Name
No response
Integration Version
1.16.0
Agent Version
8.17.0
Agent Output Type
elasticsearch
Elasticsearch Version
8.17.0
OS Version and Architecture
cloud
Software/API Version
No response
Error Message
No response
Event Original
What did you do?
What did you see?
We are seeing the message field get truncated if it begins with a
[]
. The text in the[]
is being moved to a field calledelasticsearch.index.name
This seems to be because of the grok pattern defined here: https://github.com/elastic/integrations/blob/main/packages/elasticsearch/data_stream/server/elasticsearch/ingest_pipeline/pipeline-json.yml
That moves things inside the [] as an index name
What did you expect to see?
We expected messages that begin with [] to be retained within the message field.
Anything else?
No response
The text was updated successfully, but these errors were encountered: