Caching of search results in memory #9

robbles · 2015-03-11T22:22:13Z

Is caching the search results in memory within the scope of this plugin? I'm thinking about implementing something simple and making a pull request, but I'll just make a custom plugin if it's not something that would likely be accepted.

The use case is as follows:

I have a large volume of event data going through logstash into ES
I have a much smaller set of immutable configuration-like records stored in ES that the events reference by ID
I would like to augment the events with the referenced records without needlessly overloading ES with searches, when most of the data would fit in memory

I think this could be accomplished by adding a simple LRU cache and two optional configuration values: the size of the cache in entries, and a identifier that uniquely represents the search. Without these parameters, the plugin would behave as usual and just hit ES every time.

roji · 2015-10-21T07:53:28Z

Big +1 on this.

My use case is similar, need to augment logging events with immutable ES data. This would be a huge performance boost.

pemontto · 2017-08-07T17:57:50Z

Absolutely +1 on this. I had actually assumed this was part of the plugin 😢

sw-jung · 2017-10-27T16:14:26Z

I have same needs also, but this issue haven't resolved so long time.

So I created new filter plugin for solve this and similar problems. Please see logstash-filter-memoize.

acchen97 · 2018-04-10T23:42:42Z

Hello all, apologies for the delay here. We are thinking through this caching feature and would love feedback from the broader community.

For each of your use cases, is configurable LRU caching sufficient for most workloads? It would require initial cache warmup, and if the ES lookup dataset changes often, it could result in more misses which would impact throughput.

For our DB lookups, we offer two caching options. The jdbc_streaming filter is used with an LRU caching strategy, while the jdbc_static filter allows for full local caching of the lookup dataset at startup, along with a periodic cache refresh option. Would a similar full local caching strategy be useful for you? Any other strategies you'd like to see?

pemontto · 2018-04-12T12:19:27Z

@acchen97 LRU would be suitable for our use cases, though seperate caches for hits and misses would be useful. Full local cache could also be very useful for us too.

acchen97 · 2018-04-12T23:17:05Z

@pemontto thanks for your input. Do you mind sharing details on your lookup dataset? i.e. what kind of data, how big it is, how often it changes

guyboertje · 2018-05-23T16:02:41Z

@pemontto

What is the cardinality of the lookup values?
Why separate hit miss caches? Different eviction times?

guyboertje · 2018-05-25T09:24:59Z

@acchen97
I can't see much more feedback on the horizon. IMO we can go ahead with porting the LRU cache code from JDBC Streaming in this filter. We can have separate hit and miss cache instances and a default_data setting that is used as the miss cache value.

CodeCorrupt · 2019-03-17T23:46:56Z

Big +1 on this! Has there been any movement since the last update?

passing · 2019-06-07T14:43:06Z

@acchen97 we would also benefit from this:
We are processing logs where each document has a specific "application-name" field, referring to the application that the log came from.
We want to keep a dictionary from our CMDB in elasticsearch that contains for each application the responsible business department, product group, application criticality, etc. and would like to add this information to all log documents.
Since we are processing a few thousand logs per second, we cannot use the elasticsearch filter without caching being available ...
LRU would be just fine as long as it supports setting a maximum TTL for the cached data

passing · 2019-06-07T15:48:01Z

here's btw. another workaround for the missing caching option:
https://reactivelabs.com/blog/2019/01/31/speeding-up-logstash-data-enrichment-with-memcached/
just this adds quite some complexity to the logstash configuration besides requiring to run some memcached instance

tigermatos · 2023-02-09T21:44:53Z

I know this is an old request, but just adding a huge +1. We have a logstash configuration that handles host logs. It would be extremely useful to lookup the host properties in ElasticSearch, such as Application, Customer, LogLevel, etc. This metadata (obtained from an Elasticsearch index) would not only enrich the event, but drive logic, such as, if the hostname is not tagged for WARN level, then drop WARN logs, or if the server belongs to Application XYZ, then ship the log to the XYZ index.
Stuff like that. For us, this is only viable if we can cache results locally, to avoid too frequent lookups. The idea is to lookup only when the key (hostname in this example) is not already found in a local hashmap. As simple as that.
We know how to do this using JDBC_STATIC, or memcached, but why build and maintain another database if the data is already in Elasticsearch?
Thanks.

slippman · 2023-02-09T22:41:28Z

+1

sw-jung mentioned this issue Oct 26, 2017

New filter plugin: logstash-filter-lrucache elastic/logstash#8530

Closed

acchen97 added the enhancement label Apr 10, 2018

webmat mentioned this issue May 30, 2018

How should output-elasticsearch deal with a missing @timestamp? logstash-plugins/logstash-output-elasticsearch#779

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching of search results in memory #9

Caching of search results in memory #9

robbles commented Mar 11, 2015

roji commented Oct 21, 2015

pemontto commented Aug 7, 2017

sw-jung commented Oct 27, 2017

acchen97 commented Apr 10, 2018

pemontto commented Apr 12, 2018

acchen97 commented Apr 12, 2018

guyboertje commented May 23, 2018

guyboertje commented May 25, 2018

CodeCorrupt commented Mar 17, 2019

passing commented Jun 7, 2019

passing commented Jun 7, 2019

tigermatos commented Feb 9, 2023

slippman commented Feb 9, 2023

Caching of search results in memory #9

Caching of search results in memory #9

Comments

robbles commented Mar 11, 2015

roji commented Oct 21, 2015

pemontto commented Aug 7, 2017

sw-jung commented Oct 27, 2017

acchen97 commented Apr 10, 2018

pemontto commented Apr 12, 2018

acchen97 commented Apr 12, 2018

guyboertje commented May 23, 2018

guyboertje commented May 25, 2018

CodeCorrupt commented Mar 17, 2019

passing commented Jun 7, 2019

passing commented Jun 7, 2019

tigermatos commented Feb 9, 2023

slippman commented Feb 9, 2023