Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should output-elasticsearch deal with a missing @timestamp? #779

Open
webmat opened this issue May 29, 2018 · 27 comments
Open

How should output-elasticsearch deal with a missing @timestamp? #779

webmat opened this issue May 29, 2018 · 27 comments

Comments

@webmat
Copy link
Contributor

webmat commented May 29, 2018

Events without @timestamp can be difficult to process in certain situations. According to code that dates back to 2013 (and probably before), its presence should be guaranteed. Logstash has never enforced this by preventing the removal of @timestamp per se, however.

So whenever a problem arises from a missing @timestamp, people may simply see an innocuous error like the following, and conclude that there's a bug in Logstash:

[2018-02-22T13:49:42,425][FATAL][logstash.runner ] An unexpected error occurred! {:error=>#<LogStash::Error: timestamp field is missing>, :backtrace=>["org/logstash/ext/JrubyEventExtLibrary.java:205:in sprintf'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:169:inevent_action_params'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:44:in event_action_tuple'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:38:inmulti_receive'", "org/jruby/RubyArray.java:2414:in map'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:38:inmulti_receive'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/output_delegator_strategies/shared.rb:13:in multi_receive'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/output_delegator.rb:49:inmulti_receive'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:436:in output_batch'", "org/jruby/RubyHash.java:1342:ineach'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:435:in output_batch'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:381:inworker_loop'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:342:in `start_workers'"]}```

One point of view about this is that the event should actually be considered invalid and whatever lead to this situation is likely to cause data loss.

There are two schools of thoughts or approaches we can take for these:

  1. We can try to support these events as best as possible
  2. We can take a hard line and not try to support these events, and assume that the customer's configuration has a problem that will lead to data loss. We can adjust the level of "severity" we take with this approach.

In either case, in order to prevent needless confusion and support requests, we should at the very least output a log message that's a bit more verbose than the one above, in explaining what the problem actually is. With this event it's missing @timestamp and the customer tries to index it to a timestamped index (like logstash-%{+YYYY.MM.dd}).

Are non-timestamped events legitimate?

The behaviour currently (unless the above situation happens) is to successfully index the event. I would argue there's a legitimate use case for removing @timestamp in Logstash pipelines. It's counterintuitive because Logstash was born to process time-based data. But once someone has a Logstash installation, Logstash is a great way to ingest non time-based data periodically to their ElasticSearch cluster.

Examples of non time-based data to reimport regularly:

  • zip codes (or other regional databases)
  • IP geolocation databases
  • other kinds of CSV data dumps

Another supporting argument for the legitimacy of non time-based support in Logstash is the fact that our plugins are gradually getting better at supporting non time-sensitive scenarios:

  • More explicit support in input-file for reading full files, rather than tailing them (CSV imports)
  • We've had requests (#9) and attempts (filter-lrucache) at making filter-elasticsearch be able to cache static datasets in memory, to achieve a behaviour, similar to filter-jdbc_static. This is a sign that people have static datasets in ES.

The last point in demonstrating the legitimacy of processing non-timestamped data with Logstash is the very first Logstash configuration one sees, during the ElasticSearch training.
Remove noise from Logstash events
Here we're ingesting bunch of data, but we're not interested in keeping @timestamp (among others).

Approach 1: Support these events as best as possible

The current behaviour is that events with no @timestamps work in most cases. People can do this, and are doing it. There are still some improvements we can make:

  • Output a more explicit message when someone tries to index an event without a @timestamp to a timestamped index.
    • Part of #777 was about putting in place this clearer messaging.
  • Instead of a full Logstash crash (current behaviour), output an ERROR for every event in this situation, and keep Logstash running.
    • #777 implements this behaviour
  • If the customer has DLQ enabled, we could even send the event there. Perhaps their timestamp is just named incorrectly and they may be able to reprocess these events without any other disruption.
    • One could argue that the DLQ may fill up before people notice this & so on, and they may lose data eventually.
    • Idea floated in some form in #718 and tracked as part of a bigger plan in #772

Approach 2: Interpret this as a problem with the user's configuration and crash

A lot of the feedback around #777 was not about the PR, but went in the direction that "not having a @timestamp on the event" was not officially supported. So this definitely stuck a chord with a few people :-)

In any case, nobody denies that we can improve the error message when the timestamp error happens.

Now, do we want to take a harder line to the effect that we do not want to support a missing @timestamp at all? If so, the current behaviour is not strict enough. We may want to take the following additional steps:

  • Pre-emptively refuse any event without @timestamp as soon as we see them (whether or not the timestamp is needed).
  • Eventually add a breaking change to Logstash, preventing the removal of @timestamp.
  • Until this is prevented, should we always crash on any such event? (regardless of whether the event is problematic)

1+2=3: Let users decide

Any given user may really want Logstash to behave like 1, or to behave like 2. Presumably, the default could be the friendlier approach 1, and people who take great pains to construct a pipeline that has the least data loss possible (e.g. use only non-lossy transports & plugins, careful use of PQ & DLQ) could optionally decide that such events should indeed cause a hard crash.

Why this discussion?

This issue came out of the discussion here #777, which in turn came from reports such as #739. Requesting feedback on #777 in Slack immediately led to discussions of going towards approach #2.

@jsvd
Copy link
Member

jsvd commented May 29, 2018

Just for reference, this issue happens because sprintf time interpolation behaves differently from normal interpolation:

2.3.0 :006 > e = LogStash::Event.new; e.sprintf("%{[hello]}")
 => "%{[hello]}" 
2.3.0 :007 > e = LogStash::Event.new; e.remove("@timestamp"); e.sprintf("%{+YYYY}")
LogStash::Error: timestamp field is missing
	from org/logstash/ext/JrubyEventExtLibrary.java:168:in `sprintf'
	from (irb):7:in `<eval>'

@webmat webmat changed the title output-elasticsearch should fail with a clearer error message when missing @timestamp or @version output-elasticsearch should fail with a clearer error message when missing @timestamp May 30, 2018
@webmat
Copy link
Contributor Author

webmat commented May 30, 2018

In case anyone is watching this issue, I just posted a massive update to the description of the issue. The focus is now solely on @timestamp, to keep the discussion focused on the problem at hand. Mentions of @version have been removed.

Also, I brought back my original approach to the matter on the table (trying to accommodate for the missing @timestamp). After initial feedback, I thought this approach was actually not gonna fly with anyone. But after the short discussion in the ingest meeting, I realized that I may not be alone in thinking that's a valid approach, after all :-)

@webmat
Copy link
Contributor Author

webmat commented May 30, 2018

In my mind, approaches 1 and 3 are tied as my favourites. I'd start with 1 because it's less work (it's essentially the current state + possible improvements), then if we want, we can continue working towards option 3.

My 3rd favourite approach is # 2, but I can live with it. However if we want to go the opinionated route, we'll have to be clear with our users that use cases removing @timestamp will no longer be accepted, and that this behaviour will be getting more severe with upcoming versions.

@webmat webmat changed the title output-elasticsearch should fail with a clearer error message when missing @timestamp How should output-elasticsearch deal with a missing @timestamp? May 30, 2018
@andrewvc
Copy link
Contributor

Thanks for the excellent writeup @webmat .

Whatever solution we come up with here, we should consider consistency with sprintf generically as @jsvd mentioned.

My suspicion here is that there is no one right answer here. What are your thoughts WRT creating two separate sprintf impls for plugins to use sprintf_strict and sprintf_lax, where sprintf_strict raises exceptions on missing fields (at which point it could be DLQed or whatever the author decides), while sprintf_lax always succeeds, but may include #{...}. This would mean that there would be consistency based on function call.

You'd use sprintf_strict for things like the ES index name interpolation, and sprintf_lax for everything else. We'd obsolete plain sprintf in a major.

@colinsurprenant
Copy link
Contributor

Opposing concepts:

  • @timestamp is a normal field in all respect (it can be removed) for the exception of having a specific construction & coercions behaviour : upon the creation of a new Event a @timestamp is always created so you can rely of the fact that all new Event always have a @timestamp.
  • Some features and plugins expect a @timestamp to be present and consider @timestamp a special field. This is fine because @timestamp is usually central to most configurations.

Field removal

  • We historically never wanted to force users with some permanent or non-removable fields. Users should be able to always shape the Event field the way they want without exception.
  • In that respect I think this should also be true for @timestamp as it is now.
  • The reason for removing a field is essentially for that field to not be present when serializing an Event "as a whole" typically at the outputs.

A proposal

  • I believe we should keep the behaviour of being able to remove the @timestamp field so that it does not show up when either serializing (Event.toJson) or retrieving the event fields/values collection (Event.toMap, Event.getData) as intended when removing a field.

  • I suggest we introduce a behavioral exception for the @timestamp field since it IS a special field in a way, for its behaviour at Event construction and also for its coercion behaviour. In fact internally the @timestamp value as its own class implementation.

    Removing @timestamp would work as intented BUT querying the field value from sprintf/interpolation and from field reference would always return its value regardless if was removed or not. This would allow users to shape the Event the way they want but would also never break features or plugins that rely on @timestamp to be present.

WDYT?

@andrewvc
Copy link
Contributor

@colinsurprenant I agree that @timestamp should be removable.

I'm worried about that approach due to the amount of magic there, it seems like it violates the principle of least surprise if you can remove something but it still exists in some contexts.

What did you think of my proposal to let plugin authors decide what level of strictness was needed for interpolation on a case by case basis?

@colinsurprenant
Copy link
Contributor

colinsurprenant commented May 31, 2018

@andrewvc seems to add more complexity to me. OTOH what I propose is basically surfacing a fact that is true today which is that the timestamp has some special behaviour: we even have a Event.getTimestamp method. I thins this can be documented and for the principle of least surprise I think we could argue that my proposal does satisfy it since the intention is not to have the field appear in its serialized form which is the majority of cases where a field is removed and no change would be required in plugins.

It seems to me that conceptually it will be easy to explain and understand for the users that there is one special field in the Event and it has to do with the timestamp handling and the reason is because some features and plugins expect a timestamp to be present but regardless, this field does not need to be included in the event serialized form, by removing it.

@andrewvc
Copy link
Contributor

@colinsurprenan That's a good point. I see that it does solve the problems you mention. It does have the cost of being yet another 'special' thing to document.

Here's a thought, what if we deprecate @timestamp and say that the canonical place for it is [@metadata][timestamp], which doesn't serialize by default. @metadata would then be the single place where you put data that doesn't serialize. Thoughts?

We'd also adjust sprintf for timestamps to treat that like an ordinary field.

@andrewvc
Copy link
Contributor

To clarify what I mean by 'like an ordinary field', I mean that we'd follow the same interpolation rules. I do think we need a strict interpolation variant regardless of the timestamp decision to let plugin authors decide if interpolation issues are fatal.

@andrewvc
Copy link
Contributor

An alternative approach would be if we had a rule that any field that starts with @ is not serialized. That would be another way to have simple event behavior without multiple rules.

@colinsurprenant
Copy link
Contributor

colinsurprenant commented May 31, 2018

Having the timestamp in [@metadata] is also worth considering and was basically what I verbally suggested in the last sync. The disadvantage is that it does break the current behaviour and will require ajustements in various places and this is why after giving it some thoughts that I did not suggest that here.

The problem we face is that over time @timestamp has become a special case field and rightfully so because we enforce the fact that an Event is always initialized with a valid timestamp and timestamp handling has always been considered "a special case in itself" to deal with. Many plugins reference @timestamp is various forms (as a hardcoded reference, as a default configuration option, etc) and changing the interface to the Event timestamp will certainly require lots of code change and more explanation to the users.

This is why I still think that we should not change the interface to get the Event timestamp using the [@timestamp] field ref and it should simply always return a value but provide the ability to not include it in a serialized form by using remove_field. Again, the only exception this introduces is that after doing a remove_field on [@timestamp] you'd still get a valid value because we'd now consider [@timestamp] the always working interface to the Event timestamp and introducing this single exception would probably not break anything or require changes. For the vast majority of cases this would be seamless for the users because it would satisfy the intent. It may surprise a few users that actually manage to test that [@timestamp] is correctly removed after a remove_field on it but frankly I think this will be exceptional and they'd only need to rework that test to verify the serialized output and not the field ref for [@timestamp].

@webmat
Copy link
Contributor Author

webmat commented May 31, 2018

This comment is not a response to the recent comments (that's coming after).

If anyone strongly in favour of Option 2 wants to flesh out this section more, in the issue body, you're welcome to do so. It's never how I used Logstash personally. So while I understand the point of view, I'm probably not making all of the arguments that can be made.

@webmat
Copy link
Contributor Author

webmat commented May 31, 2018

I feel like the current discussion always goes really deep right away.

But I'd like to shed light on one assumption that seems to be underlying all of this. That @timestamp is special, and that "things will break" if it's not there (these quotes represent a cloud of mystery :-) ).

I'd like to flesh out clearly what it is that breaks. I can start with the obvious one, which triggered all of this.

Breaks:

  1. Trying to index an event that doesn't have a @timestamp to a timestamped index name will fail.
    1.1. This applies to output-es, and possibly other plugins that need @timestamp to figure out what to do with the event.

If we fix this (assuming we try the lenient route), what else could this break? Is this it, or are there other things I'm not seeing?

@webmat
Copy link
Contributor Author

webmat commented May 31, 2018

@andrewvc If we were to introduce new sprintf implementations, perhaps we could consider only introducing a strict one.

Right now the behaviour is:

  • lax for %{...} interpolation, aka field interpolation
  • strict+crash for %{+...} interpolation, aka time format interpolation

The new behaviour, if we take that route could be:

  • sprintf becomes lax for both types of interpolation (least breaking change)
  • sprintf_strict is strict, and can be used for field or time format interpolation as well. Plugin authors may or may not decide to start using it.

----------

Reset.

Now that I phrase it this way, I wonder if multiple implementations of sprintf are a solution to this. Part of this discussion is "should these situations crash Logstash or not". Presumably, Logstash users are the ones who should decide this, rather than us.

Introducing multiple versions of sprintf would mean that potentially many plugins need to be updated the right way, to actually give Logstash users this choice. Plugin needs to 1) start using both versions of sprintf 2) give users an option to decide severity.

Perhaps we should be considering adding a Logstash-level option to decide behaviour on interpolation failures, instead?

Options:

  • crash: crash logstash
  • error: log an error, raise a non-fatal exception
  • dlq: log an error, send the event to DLQ
  • drop: drop the event
  • none: leave the field template in the result string (missing field foo, would leave %{foo} in the output)

The behaviour we have in current versions of Logtstash (at least wrt to output-es) is:

# logstash.yml
interpolation.field: none
interpolation.time: crash

Given the different role of different plugins, interpolation_field and interpolation_time could also become Common Options, where users can selectively override the behaviour. This way someone building a really strict pipeline can still get leniency around some parts of their pipelines.

I haven't fully thought through the implications here. But I think this more directly addresses "should we crash" or not. With this approach, the answer becomes the beloved "It Depends ™" (but you decide).

@colinsurprenant
Copy link
Contributor

We have definitely broadened the discussion away from only the output-es problem, but that's a good thing. Maybe we should move the discussion into a logstash issue.

Generally speaking when I referenced things that will break, in the context of trying to fix the larger issue by getting rid of the @timestamp field I meant that:

  • users with configurations that reference @timestamp will not work and will need refactoring/updating.
  • many plugins that reference @timestamp either as a hardcoded field reference of default config option to get or set the event timestamp would need to be updated.

This is why I believe that the @timestamp field ref as the interface to read/write the Event timestamp should be kept as-is and always working regardless if remove_field was done on it.

@webmat
Copy link
Contributor Author

webmat commented May 31, 2018

One thing's for sure, moving @timestamp to @metadata.timestamp would be a breaking change that affects a lot of plugins. The sum of plugins that would suffer from this would be a big hindrance for people considering upgrading. We may be diligent in fixing all supported plugins, but a portion of the unsupported plugins would be forever left behind...

@webmat
Copy link
Contributor Author

webmat commented May 31, 2018

@colinsurprenant If we made @timestamp resilient to deletion by hiding it, what would the behaviour be, here? ;-)

  if [@timestamp] {
    # ...
  } else {
    # ...
  }

@andrewvc
Copy link
Contributor

OK, so new thought having chewed through all the excellent points both @colinsurprenant and @webmat have made.

I particularly like your proposal for per-field support @webmat but I think doing an extra file is awkward. What if we implemented it at the LS config language level? What if say %{} acted as it does today, but #{} was the strict variant? That puts all the power into the hands of the user. For timestamps, you'd either write %{+...} or #{+...} to get lax or strict variants. To solve the serialization issue, we could extend the syntax to support timestamp interpolation of arbitrary fields, so you could do #{[@metadata][timestamp]+yy.mm.dd) to get strict interpolation of a non-serialized field, for instance.

So, the guidance for a user not wanting to serialize the timestamp, but to use it, would be to use that sprintf format, and to move the timestamp from @timestamp to @metadata, which is already the place where things are not serialized.

@webmat
Copy link
Contributor Author

webmat commented May 31, 2018

Yes! I like the idea of having two ways of interpolation. A strict one and a lax one.

Tongue in cheek caveat: Now Rubyists will get confused by two things, rather than one. The hash syntax without the commas, and now the look-alike string interpolation delimiters that actually have no relation with Ruby's 😆

Kidding aside, I like this. It gives maximum granularity (both can even be used in one string template, at the extreme), without introducing anything scattered all over the place (logstash.yml, plugin args). It's just two different delimiters. Love it!

@andrewvc
Copy link
Contributor

@webmat that's a good point, we already have %{} and ${}, so adding a third character is inherently tricky. I suppose it could be any of @^&#*~ as a sigil. I don't have a particular preference. It's too bad $ is taken, because you could remember it as $trict. WDYT?

The strict behavior, BTW, should be to either log and warn, or go to the DLQ IMHO.

@colinsurprenant
Copy link
Contributor

@andrewvc @webmat hypothetical question: let's say that @timestamp is always readable, would the strict interpolation still be necessary/useful?

@colinsurprenant
Copy link
Contributor

@webmat the behavior with

  if [@timestamp] {
    # ...
  } else {
    # ...
  }

would be to never take the else branch obviously. I'd be extremely surprised to see anything like that in the wild though because first [@timestamp] always exist at Event creation and second someone removing @timestamp does so for it to not be part of the serialized output so there is no real point in doing such a check in a config.

@andrewvc
Copy link
Contributor

@colinsurprenant I think it's a useful feature on its own yes. It'd actually still be useful for @timestamp fields.

Let's imagine a situation where the real timestamp is in [custom][foo]. The user might have logic like overwrite @timestamp with the value in [custom][foo], if that is null, however, and we always guarantee a timestamp, what then? Do we just not overwrite it? Log a warning?

The exciting thing about the approach of having the user pick the syntax is that it puts the choice of behavior in the hands of the user. It also is a generally useful enhancement, which is great because it reduces the total complexity of the system. Most importantly, however, it keeps the usage of field removal/writing consistent. I'm not against having exceptional cases, but exceptional cases often indicate an impedance mismatch in the modeling of a problem. If we can solve a problem without that and put the user in control, that feels like a win.

@colinsurprenant
Copy link
Contributor

@andrewvc lets keep in mind that we already have a different behaviour for @timestamp handling:

  • it is always initialized with a valid Timestamp object at Event creation.
  • valid coercion is enforced when setting the @timestamp field.
  • @timestamp IS the interface to a "always valid" Timestamp object. Using any other field does not guarantee that its value will be a valid Timestamp.

@andrewvc
Copy link
Contributor

@colinsurprenant it makes sense to reduce the number of special cases, and the amount of complexity. We should probably remove all those special guarantees. IMHO we should just make @timestamp a normal field, and make the auto-population of it configurable.

Let's imagine we're teaching a course on Logstash, we could either explain:

  1. By default, Logstash will create a Timestamp value called @timestamp for you.
  2. Logstash will always create a Timestamp value called @timestamp. This field is not removable and can only contain a Timestamp type, setting it to anything else is an error. If you invoke a remove on the field it will still be present, for reads, but will not be serialized.

Moving to it being a normal field would actually clean up a lot of weirdness in how examples work. In some examples we read a string version of a timestamp into [timestamp], and then use the date filter to move it into [@timestamp], then finally delete [timestamp]. It'd make more sense to read the string straight into [@timestamp] and then immediately convert the type in place with the date filter IMHO.

One other thought, have you considered that the 'hide-able' approach makes the logical nullity of timestamps weird to express. If you want to express "I don't have a timestamp" then you need to make some other field for that called [no_timestamp]. Imagine a user who expects the origin of requests to always provide the timestamp. Our 'default' behavior then gets in the way and they need to hack around it.

@colinsurprenant
Copy link
Contributor

I guess It is a question of how much we need to change to solve which scope of problem, in what timeframe taking into account the BWC issues. The point of my proposed solution is to be simple, applicable today, backward compatible and does not require any modifications in the plugins or user configurations. But the tradeoff is to have a exceptional behaviour when removing the @timestamp field (which I tried to show it is a minor tradeoff IMO because the intent of removing that field is preserved, but that fine if opinions diverge here).

It's all good if we agree to go wide and completely redefine the timestamp handling and behaviour in logstash and plugins too.

@jsvd
Copy link
Member

jsvd commented Jun 1, 2018

Considering logstash 7.0 is far away but not too much, this is a great time to be considering breaking changes and the upgrade+deprecation path for them in the 6.4/6.5 minors.

IMO, I think that moving the timestamp field to the @metadata (or another meta namespace) and making all sprintf calls lax would remove surprises in sprintf behaviour, solve the default serialization issue and keep the current timestamp related apis in the event class working as before.

BWC-wise, this shouldn't break almost any plugin, as they should be using event.timestamp = or event.set_timestamp anyway. Grepping for @timestamp on our default plugins shows a very small list of hits.
One thing we'd have to consider is the default schema we push to elasticsearch, that currently contains @timestamp, but that could be solved at the es output plugin level.

Tbh I don't see an urgency is solving this in any immediate time frame, this situation pops up once in a while and we could just improve the wording in the exception we output, but otherwise we have the breathing room to make "the right decision" without pressure to deliver a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants