Optimizing Data Storage and Retrieval for Time Series data. #9568

nkumar04 · 2023-08-28T03:58:32Z

Is your feature request related to a problem? Please describe.
In OpenSearch, documents are stored using three primary formats: indexed, stored, and docValues. In case of time series data, a document usually consists of dimensions, time point and quantitative measurements that are used monitor various aspects of a system, process, or phenomenon. In cases like these, the numeric, keyword and similar datatypes are stored as the "stored" field as well as docValues that serves specific purposes related to search performance and retrieval. DocValues are a columnar storage format used by Lucene to store indexed data in a way that facilitates efficient aggregation, sorting etc and stored fields, on the other hand, are used to store the actual values of fields as they were inserted into the index.
For example, lets look at a document consists of performance related data points of an ec2 instance.

{
	"hostName": "xyz",
	"hostIp": "x.x.x.x",
	"zone" : "us-east-1a"
	"@timestamp": 1693192062,
	"cpu": 80
	"jvm": 85
}

Here, hostName, hostIp and zone uses keyword as field type and timestamp/cpu/jvm uses numeric fields. Values for dimensions and mesurements are stored in docValue as well in stored fields as _source. The most common search query for such data set is aggregations like min/max/avg/sum. As we can see, data is stored twice here, we can possibly avoid storing data twice.

Describe the solution you'd like
Currently the _source field stores the original documents as stored field. We can possibly skip storing the _source field in such cases and retrieve the field values from docValue instead. This will help in reducing the storage cost significantly. Based on the nature of the query we can skip or fetch some or all of the fields from docValues to serve the search queries.

The text was updated successfully, but these errors were encountered:

nkumar04 · 2023-08-28T04:03:39Z

@shwetathareja

mgodwan · 2023-08-29T11:07:24Z

While this may provide gains in storage/throughput, it introduces few trade-offs:

Execution of painless scripts: Painless scripts may rely on _source field.
Re-indexing relies on _source fields as well. There can be cases where customers can choose to ignore any fields for which mappings may not be declared but they are still stored in _source field. These fields will then not be a part of the storage.

These use cases may apply for time-series data, and should be evaluated to support with the proposed optimization.

shwetathareja · 2023-08-29T17:49:49Z

This should also reduce the merging overhead if _source is not stored explicitly.

+1 to @mgodwan : As we start looking into solution, we need to carefully analyze the restriction it would bring in terms of use cases. Also if those restriction would work for time series workload specifically. Instead of skipping the _source completely, should we consider filtering the _source for specific field for which doc_Values is enabled.

msfroh · 2023-09-18T02:11:57Z

Can we try benchmarking something like this using the existing code?

It's already possible to exclude some/all fields from source and it's possible to explicitly request that fields be loaded from doc values.

That is, users can theoretically tune their index to do exactly what's being proposed here -- it sounds like we just want to make it easier (e.g. by defining a time series index type that excludes doc values fields from source and automatically retrieves those doc values fields at query time).

I think the proposed change could help a lot. We could measure that improvement now by making some changes to the http_logs workload, I think.

msfroh · 2023-09-18T02:39:17Z

Related, if we can make retrieving from doc values "feel" the same as retrieving from source, we could transparently retrieve doc values instead of source if only fields with doc values are requested (to avoid decompressing the stored field block altogether).

shwetathareja · 2023-09-18T07:14:22Z

Related, if we can make retrieving from doc values "feel" the same as retrieving from source, we could transparently retrieve doc values instead of source if only fields with doc values are requested (to avoid decompressing the stored field block altogether).

Right, @msfroh , yes agreed this optimization can help in all cases ( _source enabled/ disabled)

nkumar04 · 2023-09-18T08:15:58Z

@msfroh , you are right we can exclude/include fields from source and request fields to be explicitly loaded from docValues. I tried benchmarking the indexing performance to test exclusion of fields that are also stored as docValue(mapping below).

Findings:

Indexing throughput degradation by ~12%
Latency degradation P50: ~15%
Latency degradation P90: ~13%
Storage gain: ~4%

The overhead in this case is mostly due to rewriting of _source at the time of indexing. (link)

In case, _source is completely omitted, there is ~25% gain in storage and slight gain in indexing throughput (~3-4%) and (~3%) gain in P50/P90 latency.

I am yet to benchmark the query performance (explicitly request that fields be loaded from doc values.).

That is, users can theoretically tune their index to do exactly what's being proposed here -- it sounds like we just want to make it easier (e.g. by defining a time series index type that excludes doc values fields from source and automatically retrieves those doc values fields at query time).

Yes, the idea is to make the these optimisation under the hood and make it default for timeseries indexes. We can keep source field disabled by default if all fields can be fetched from docValues and source can be generated at the query time. In case, some of the fields are not of docValue type, then we can store some of them as stored field and at the query time we can fetch partially from docValues and partially from stored field if _source is requested (depends the nature of the query)

git diff index.json
diff --git a/http_logs/index.json b/http_logs/index.json
index dfe1cd0..ea98d3a 100644
--- a/http_logs/index.json
+++ b/http_logs/index.json
@@ -7,7 +7,12 @@
   "mappings": {
     "dynamic": "strict",
     "_source": {
-      "enabled": {{ source_enabled | default(true) | tojson }}
+      "excludes": [
+        "@timestamp",
+        "clientip",
+        "status",
+        "size"
+      ]
     },
     "properties": {
       "@timestamp": {

msfroh · 2023-09-18T20:40:53Z

I'm going to open a spin-off issue to target just the querying side of this, because I think doing doc value retrieval to avoid decompressing stored source might be a quick win (independent of the indexing changes).

bugmakerrrrrr · 2024-12-02T15:48:22Z

Hi all, I have developed something called derived source(the name is inspired by derived fields). It is similar in nature to this idea. If no one is currently working on this, I'd be happy to draft a RFC to detail my approach.

shwetathareja · 2024-12-03T03:10:11Z

@bugmakerrrrrr feel free to update this RFC or create a new one with your thoughts and link here.

mgodwan · 2024-12-03T05:03:30Z

@bugmakerrrrrr @shwetathareja

@rayshrey and I have been working on a POC to achieve the same as well. Currently, the POC is working for basic use cases. We are currently working to solve for updates/get, and search path. We're planning to have an RFC in next couple of weeks.

bugmakerrrrrr · 2024-12-08T12:54:07Z

Background

In OpenSearch, field data is not only stored in various indexes, including inverted indexes, column stores, and BKD indexes, but also in _source fields. The _source fields can be understood as row stores, which save the original JSON data of each document. When a document matching the user's query statement is found through the index data, the original text can be directly read from the _source field and returned to the user (that is, the _source part of the query response). It can be seen from this that although the _source field can improve the retrieval speed, it also increases the storage cost. In some application scenarios, users do not require high query response speed, but are sensitive to cost, or rarely need to read _source data, such as time series data Scenario, you can try to remove the storage of _source fields, and then concatenate native document data through column storage data during the query process.

In the current OS, some similar functionality has been provided. When creating index mapping, users can specify includes/excludes in the _source , or even completely disable _source fields , thereby reducing storage costs. Target field values can also be retrieved from column storage through docvalue_fields and stored_fields parameters during retrieval. However, this method has the following problems:

_source fields are not only used for retrieval, but also provide data for use cases such as update, reindex, script, and operation-based recovery. If users crop _source fields, the above functions will be unavailable, especially in the case where OpenSearch relies entirely on soft delete for operation-based recovery.
Users need to manually specify the fields to be fetched when requesting, which is not user-friendly and cannot be fully compatible with existing systems.
From the previous POC, using includes/excludes for source filtering is detrimental to write performance, while completely disabling _source fields can improve write performance.

Goal

When writing documents to OpenSearch, the source field is not stored at all, and the source field can be reconstructed from the column store and returned to the user when reading, and is compatible with existing reindex, script, security and other functions.

Proposal

Mapper

The schema of index in OpenSearch is specified by mapping, corresponding to the definition of each mapper in the code. mapper naturally has a nested structure, non-leaf field corresponds to ObjectMapper , leaf field corresponds to FieldMapper , and the root node corresponds to RootObjectMapper . To support derived source, the following method is introduced in the top-level abstract class of Mapper :

/**
 * Validate whether this mapper supports derived source.
 */
public void validateDerivedSource() {
    throw new IllegalArgumentException();
}

/**
 * Fill derived source data.
 */
public void fillSource(LeafReader reader, int docID, XContentBuilder source) throws IOException {
    throw new UnsupportedOperationException("[" + typeName() + "] does not support derived source");
}

During the mapping update phase (including creating indexes, dynamically updating mappings when indexing, etc.), the mapping is checked to verify whether each field configuration in the document supports derived source. If any field does not support it, an exception is thrown and creation or writing is not allowed. When implemented, each type of mapper will override the validateDerivedSource method of the parent class to check whether the corresponding configuration supports rebuilding the source field. When parsing the mapping, the validateDerivedSource method of the root mapper corresponding to the index is called, which recursively calls the corresponding method of each subfield according to the defined document field hierarchy to complete the verification.

Specifically, for non-leaf fields, including root, the following attributes need to be checked:

Whether it is nested , if so, derived source is not supported;
Whether it is indexed, that is, whether enable is true , if not, it is not supported;
Whether each subfield supports derived source.

For leaf fields, most fields only need to be checked:

Whether there is a copy_to field, if there is, it does not support derived source;
Whether there is a doc_values , if not, it is not supported, except for string-related field types
For string-related types:
- Except for Text , if there is no doc_values , check whether the store is true;
- For Text type, in addition to checking the store option, you also need to check whether there is a subfield of the keyword type in the multi_fields , and if there is, recover from the subfield. The main reason here is for the support of dynamic fields.

Write phase

In the current writing phase, before parsing the document, the metadata fields are preprocessed, and it is also at this stage that the _source fields are added to the document. If the derived source is enabled, the processing in the writing phase is relatively simple, ignoring the configuration related to _source mapping, and directly skipping the processing of SourceFieldMapper , including _source fields and _recovery_source fields will not be generated.

For other APIs that rely on dynamically generated _source fields, such as get by id, get term vectors, etc., it may be necessary to generate documents containing _source through translog or documents passed in the request. When derived source is enabled, the parsing logic of SourceFieldMapper will be called to explicitly add _source to the parsed document.

Read phase

As mentioned above, _source fields not only play an important role in querying or reading document requests, but also in scenarios such as script and recovery. In the current implementation, the logic of reading _source is scattered throughout the code. If the _source fields are rebuilt when reading, it will not only make the code difficult to maintain and may miss the Critical Path, but also require many interfaces to be modified to pass the required parameters to the bottom layer, which may also destroy the current abstract encapsulation. Finally, it is impossible to be compatible with the existing security functions of the security plugin, such as field-level security.

In Lucene, the field type corresponding to the _source field is stored, when reading the _source field, the actual method used is the document method in the StoredFields returned by the IndexReader, which receives parameters including the id of the target document and the StoredFieldVisitor. StoredFieldVisitor is used to determine whether the corresponding field is needed, and to receive and process the incoming field value, the pseudo-code can be expressed as follows:

for each sf in stored fiels of doc[id]:
    if visitor needs sf:
        visitor.accept(sf)

Therefore, we can consider wrapping the StoredFields obtained from Lucene when the derived source is enabled, and first determine whether the visitor needs to _source the field in the document method. If necessary, rebuild the _source field and pass it to the visitor for processing. In existing OpenSearch implementations, most of callers use the indexReaderWrapper in IndexShard to wrap the index reader when fetching a searcher. We can take advantage of this extension point by wrapping the wrapper obtained from the plugin with the derived source-related reader wrapper and pass in the reconstructed _source field. It is worth noting that the order here is important because the indexReaderWrapper is mainly used for the field filtering function of security plugin. If the derived source wrapper is inside the security-related wrapper, it will escape security restrictions. In addition, derived source reader wrapper inherits from SequentialStoredFieldsLeafReader to ensure support for sequence read optimization.

// IndexShard
if (indexSettings.isDerivedSourceEnabled()) {
            readerWrapper = reader -> {
                final DirectoryReader wrappedReader = indexReaderWrapper == null ? reader : indexReaderWrapper.apply(reader);
                return DerivedSourceDirectoryReader.wrap(wrappedReader, mapperService.documentMapper().root()::reconstructSource);
            };
} else {
       readerWrapper = indexReaderWrapper;
}


// DerivedSourceStoredFields
public class DerivedSourceStoredFields extends StoredFields {
    @Override
    public void document(int docID, StoredFieldVisitor visitor) throws IOException {
        assert visitor instanceof FieldsVisitor;
        // process _source field first so that we can return fast
        if (visitor.needsField(FAKE_SOURCE_FIELD) == StoredFieldVisitor.Status.YES) {
            visitor.binaryField(FAKE_SOURCE_FIELD, sourceProvider.apply(docID));
        }
        delegate.document(docID, visitor);
    }
}

When reconstructing _source fields, the fillSource method mentioned in the Mapper section is used, which is responsible for reading the current field value and putting it into the corresponding level of source. The entrance of the reconstruction is in the reconstructSource method in the index RootObjectMapper , which calls the fillSource method layer by layer for subfields, and each layer only needs to be responsible for rebuilding the current level.

rayshrey · 2025-01-01T09:35:13Z

Background

docValues are a columnar storage format used by Lucene to store indexed data in a way that facilitates efficient aggregation, sorting etc and stored fields, on the other hand, are used to store the actual values of fields as they were inserted into the index.

Currently the _source field stores the original documents as stored field. We can possibly skip storing the _source field in cases where the values are already store in docValues and retrieve the field values from docValue instead. This will help in reducing the storage cost significantly. Based on the nature of the query we can skip or fetch some or all of the fields from docValues to serve the search queries.

Approach

We can create a third option for the _source field - derived
When we have the source field set as derived, the indexing behavior remains the same as disabled . We do not store the source field. Instead we dynamically generate the source field in the search path or get path for the document using the docValues.

To dynamically generate the source we:

Create an empty Map<String, Object> to store the dynamically generated source
Iterate over all the mappers from mapperService.documentMapper().mappers()
For mappers that are instance of FieldMappers, we get the docValues for that field - reader.getSortedSetDocValues(fieldName)
Using the docId we advance to the exact point in the SortedSetDocValues fetched above - advanceExact(docId and based on the docValueType(FLOAT, DOUBLE, INTEGER etc) we parse accordingly and add entry in the Map created in step 1
Finally when we have traversed over all the FieldMappers we return our generated map.

POC

https://github.com/rayshrey/OpenSearch/tree/derived_source

Benchmarking

Started the benchmarking with http_logs workloads (single OSB run with number of shards set to 5)

Initial findings:

Mean throughput increased by ~12%
Store size decreased by ~40%

To solidify these findings, did another benchmarking with nyc_taxis workload
(Numbers below are an average of 3 perf runs with number of shards set to 1).

Metric	Task	Original Average	POC Average	Difference in %
Cumulative indexing time of primary shards	-	142.12104	124.06797	-12.70261
Store size	-	34.66648	25.09844	-27.60025

Min Throughput	index	94082.91	117609.81	25.00656
Mean Throughput	index	100733.92	123147.03	22.24981
Median Throughput	index	98879.54	122479.65	23.86754
Max Throughput	index	112705.9	129715.02	15.0916
50th percentile latency	index	642.37949	512.46629	-20.22375
90th percentile latency	index	1275.5934	1035.39243	-18.83053
99th percentile latency	index	3104.09835	3349.86936	7.91763
99.9th percentile latency	index	5898.81467	6719.84404	13.91855
99.99th percentile latency	index	7731.30663	8689.03831	12.38771
100th percentile latency	index	8273.30637	9017.17344	8.99117

Min Throughput	default	3.02	3.02	0
Mean Throughput	default	3.03	3.03	0
Median Throughput	default	3.03	3.03	0
Max Throughput	default	3.06	3.06	0
50th percentile latency	default	4.0399	3.36422	-16.72527
90th percentile latency	default	4.45771	3.73742	-16.1583
99th percentile latency	default	4.59505	3.91264	-14.85111
100th percentile latency	default	4.63143	3.92635	-15.2237

Min Throughput	top_hits_agg	0.25	0.08	-68
Mean Throughput	top_hits_agg	0.26	0.08	-69.23077
Median Throughput	top_hits_agg	0.27	0.08	-70.37037
Max Throughput	top_hits_agg	0.27	0.08	-70.37037
50th percentile latency	top_hits_agg	171623.49953	`1119762`.27731	552.45277
90th percentile latency	top_hits_agg	235515.13352	1559466.70607	562.15138
99th percentile latency	top_hits_agg	249957.35285	1658264.82089	563.4191
100th percentile latency	top_hits_agg	251578.36497	1669132.83809	563.46438
50th percentile service time	top_hits_agg	3563.98491	13060.67433	266.46267
90th percentile service time	top_hits_agg	3587.21859	13191.20307	267.72789
99th percentile service time	top_hits_agg	3601.25789	13268.92686	268.45256
100th percentile service time	top_hits_agg	3611.79574	13280.14878	267.68826

For the top_hits_agg used the following operation

{
    "name": "top_hits_agg",
    "operation-type": "search",
    "index": "nyc_taxis",
    "body": {
      "size": 0,
      "query": {
        "range": {
          "dropoff_datetime": {
            "from": "2015-01-01 00:00:00",
            "to": "2015-01-01 12:00:00"
          }
        }
      },
      "aggs": {
        "2": {
          "terms": {
            "field": "payment_type",
            "size": 1000
          },
          "aggregations": {
            "3": {
              "date_histogram": {
                "field": "dropoff_datetime",
                "fixed_interval": "5s"
              },
              "aggregations": {
                "1": {
                  "top_hits": {
                    "size": 1,
                    "sort": [
                      {
                        "dropoff_datetime": {
                          "order": "desc"
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      }
    }
}

Observations:

Indexing throughout increased around ~22%
While the 50th and 90th percentile latency decreased(around ~18-20%), the 99th and 100th percentile increased around ~8%
For the top_hits_agg operation, there was a huge increase in the latency (around ~500%)

navneet1v · 2025-01-20T19:16:16Z

@mgodwan, @nkumar04 , @rayshrey having a derived source is a very good solution. in k-NN plugin for vectors also we are working on building the derived source. Here is the RFC for that: opensearch-project/k-NN#2377. We identified that we don't store vectors in _source and use the KNNVectorValues to get these vectors for _source we can save more than 60% of the storage for the vector indices. Not sure what approach you are going to take here. We have evaluated couple of approaches @jmazanec15 can add more here. But I feel any solution we are building should work for different fields.

Also +1 on the indexing improvements, if we remove the fields from source and recovery source the indexing improvements will be there. In my observations the indexing improvements were greater than 22% for vector workloads if we remove source and recovery source. + different other benefits. Ref: #13490

navneet1v · 2025-01-20T19:19:47Z

+1 to @mgodwan : As we start looking into solution, we need to carefully analyze the restriction it would bring in terms of use cases. Also if those restriction would work for time series workload specifically. Instead of skipping the _source completely, should we consider filtering the _source for specific field for which doc_Values is enabled.

@shwetathareja it will not always be the doc values. One example is Vectors where vectors before 2.17 were stored as docvalues but now have been moved to KNNVectorFormat(provided by Lucene). :)

jmazanec15 · 2025-01-20T19:44:50Z

Yes - looks like we might be working on something similar - opensearch-project/k-NN#2377 - which probably means its a good idea and efforts should be merged! I was working on just for vector field - but I think generalization would be good and want to ensure consistency with the feature.

My approach is similar to @bugmakerrrrrr - adding extension at Lucene level. I was working with custom codec in plugin, so looks like a bit different. PoC here: https://github.com/jmazanec15/k-NN-1/tree/derived-source-vectors. I did make it work with nested but it requires implementing a lot of extra logic in the lucene layer: https://github.com/jmazanec15/k-NN-1/blob/derived-source-vectors/src/main/java/org/opensearch/knn/index/codec/KNN990Codec/ParentChildHelper.java.

Specifically, for non-leaf fields, including root, the following attributes need to be checked:
1. Whether it is nested , if so, derived source is not supported;
2. Whether it is indexed, that is, whether enable is true , if not, it is not supported;
3. Whether each subfield supports derived source.
For leaf fields, most fields only need to be checked:
1. Whether there is a copy_to field, if there is, it does not support derived source;

@bugmakerrrrrr For nested, index and copy_to, why is it not supported?

@mgodwan @rayshrey did you consider implementing override at codec/lucene level in the stored fields format as opposed to OpenSearch level constructs?

bugmakerrrrrr · 2025-01-21T04:52:52Z

Yes - looks like we might be working on something similar - opensearch-project/k-NN#2377 - which probably means its a good idea and efforts should be merged! I was working on just for vector field - but I think generalization would be good and want to ensure consistency with the feature.

+1, in my proposal, the plugin only needs to override the new methods introduced in the mapper to support derived source functionality for custom field types.

@bugmakerrrrrr For nested, index and copy_to, why is it not supported?

For nested, it can be supported, I just don't have it implemented at the moment.
For index, if the enable option of object field is `false, its sub fields won't be processed, we can't reconstruct the source.
For copy_to, we need to refer to all of its sources to recover the original text, and sometimes we can't even tell if it's its own value or a copy, so I chose not to support.

bugmakerrrrrr · 2025-01-21T04:56:09Z

+1 to @mgodwan : As we start looking into solution, we need to carefully analyze the restriction it would bring in terms of use cases. Also if those restriction would work for time series workload specifically. Instead of skipping the _source completely, should we consider filtering the _source for specific field for which doc_Values is enabled.

@shwetathareja it will not always be the doc values. One example is Vectors where vectors before 2.17 were stored as docvalues but now have been moved to KNNVectorFormat(provided by Lucene). :)

+1, for string type, it can be the stored field, each type of field may require its own unique processing logic.

mgodwan · 2025-01-21T12:10:07Z

+1 to @bugmakerrrrrr points around copy_to. The field will always be ambiguous to deal with.
IMO, it is a good stating point to keep derived source disabled based on the params (e.g. copy_to, ), and then if we figure out more way to solve for such cases, we can keep adding the support. We will have to deal with multiple aspects around performance, etc. which we iteratively can keep solving for by reducing the validations.

@shwetathareja it will not always be the doc values. One example is Vectors where vectors before 2.17 were stored as docvalues but now have been moved to KNNVectorFormat(provided by Lucene). :)

+1, the idea is to be able to generate the values using the available formats (E.g. Doc Values, Stored Field, Vector). Each field type can dictate how to fetch values for the specific field so that the source to be returned to the users can be generated. This can be done with an interface with the existing Mapper which can tell whether the field type can supported based on the associated params, and then provide a way to append those values to the source.

mgodwan · 2025-01-21T13:01:18Z

@mgodwan @rayshrey did you consider implementing override at codec/lucene level in the stored fields format as opposed to OpenSearch level constructs?

Having to modify the codec can be tricky as that will require a lot of interoperability amongst codecs which the fields declared in plugins may or may not be a candidate for (e.g. knn requires a different codec, and can implement it. scaled_float does not require a separate codec and can just function with a couple of interface methods within OpenSearch Mapper constructs using the docValuesFormat().

I like the idea of having a wrapper on StoredFieldVisitor as @bugmakerrrrrr also suggested which will be pretty useful to hide underlying details from OpenSearch integration point.

bugmakerrrrrr · 2025-01-21T13:58:34Z

I like the idea of having a wrapper on StoredFieldVisitor as @bugmakerrrrrr also suggested which will be pretty useful to hide underlying details from OpenSearch integration point.

If we decide to implement it this way, I can create a PR, which I've already implemented a version of internally.

rayshrey · 2025-01-21T19:19:37Z

@bugmakerrrrrr I just published a PR with the first basic changes required for implementing the derived source field can you review and see if it aligns with your implementation - #17040

jmazanec15 · 2025-01-22T01:21:39Z

Thanks @mgodwan that makes sense.

@bugmakerrrrrr I like wrapping the StoredFieldVisitor as well and then overriding in the plugin is. It is very elegant and seems like the right long term approach.

For 2.19, in k-NN, Im going to try to implement the approach in opensearch-project/k-NN#2377 to get initial feedback and then migrate to the generalized solution in the future.

mgodwan · 2025-01-23T13:21:50Z

If we decide to implement it this way, I can create a PR, which I've already implemented a version of internally.

Nice!
It would be great to see how we can collaborate on this.
We would like to take the effort forward and it would be great if you could contribute/review the PRs

There is already one PR by @rayshrey to have a mapper interface and implementation in keyword field which aligns with your thoughts where we can discuss if the interfaces are in line.

I'm also working on a PR to allow users to configure the usage of derived source for their indices which I plan to raise once the Mapper interface is finalized so we can verify whether derived source can be supported for user mappings or not.

shwetathareja · 2025-01-27T08:28:00Z

Thanks for the discussion folks, wrapper on StoredFieldVisitor is definitely a neat approach, +1.

nkumar04 added enhancement Enhancement or improvement to existing feature or request untriaged labels Aug 28, 2023

shwetathareja added discuss Issues intended to help drive brainstorming and decision making Indexing Indexing, Bulk Indexing and anything related to indexing Performance This is for any performance related enhancements or bugs and removed untriaged labels Aug 29, 2023

macohen added this to OpenSearch Lucene & Core Performance Tracking Sep 8, 2023

github-project-automation bot moved this to Lucene (In Progress) in OpenSearch Lucene & Core Performance Tracking Sep 8, 2023

mgodwan mentioned this issue Jan 17, 2025

[META] Enabling Derived Source in OpenSearch #17048

Open

rayshrey mentioned this issue Jan 21, 2025

[Derived Source] Add support for deriving source field in FieldMapper #17073

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing Data Storage and Retrieval for Time Series data. #9568

Optimizing Data Storage and Retrieval for Time Series data. #9568

nkumar04 commented Aug 28, 2023 •

edited

Loading

nkumar04 commented Aug 28, 2023

mgodwan commented Aug 29, 2023

shwetathareja commented Aug 29, 2023 •

edited

Loading

msfroh commented Sep 18, 2023

msfroh commented Sep 18, 2023 •

edited

Loading

shwetathareja commented Sep 18, 2023

nkumar04 commented Sep 18, 2023 •

edited

Loading

msfroh commented Sep 18, 2023

bugmakerrrrrr commented Dec 2, 2024

shwetathareja commented Dec 3, 2024

mgodwan commented Dec 3, 2024

bugmakerrrrrr commented Dec 8, 2024

rayshrey commented Jan 1, 2025

navneet1v commented Jan 20, 2025

navneet1v commented Jan 20, 2025

jmazanec15 commented Jan 20, 2025

bugmakerrrrrr commented Jan 21, 2025

bugmakerrrrrr commented Jan 21, 2025

mgodwan commented Jan 21, 2025

mgodwan commented Jan 21, 2025

bugmakerrrrrr commented Jan 21, 2025

rayshrey commented Jan 21, 2025

jmazanec15 commented Jan 22, 2025

mgodwan commented Jan 23, 2025

shwetathareja commented Jan 27, 2025

Optimizing Data Storage and Retrieval for Time Series data. #9568

Optimizing Data Storage and Retrieval for Time Series data. #9568

Comments

nkumar04 commented Aug 28, 2023 • edited Loading

nkumar04 commented Aug 28, 2023

mgodwan commented Aug 29, 2023

shwetathareja commented Aug 29, 2023 • edited Loading

msfroh commented Sep 18, 2023

msfroh commented Sep 18, 2023 • edited Loading

shwetathareja commented Sep 18, 2023

nkumar04 commented Sep 18, 2023 • edited Loading

msfroh commented Sep 18, 2023

bugmakerrrrrr commented Dec 2, 2024

shwetathareja commented Dec 3, 2024

mgodwan commented Dec 3, 2024

bugmakerrrrrr commented Dec 8, 2024

Background

Goal

Proposal

Mapper

Write phase

Read phase

rayshrey commented Jan 1, 2025

Background

Approach

POC

Benchmarking

Observations:

navneet1v commented Jan 20, 2025

navneet1v commented Jan 20, 2025

jmazanec15 commented Jan 20, 2025

bugmakerrrrrr commented Jan 21, 2025

bugmakerrrrrr commented Jan 21, 2025

mgodwan commented Jan 21, 2025

mgodwan commented Jan 21, 2025

bugmakerrrrrr commented Jan 21, 2025

rayshrey commented Jan 21, 2025

jmazanec15 commented Jan 22, 2025

mgodwan commented Jan 23, 2025

shwetathareja commented Jan 27, 2025

nkumar04 commented Aug 28, 2023 •

edited

Loading

shwetathareja commented Aug 29, 2023 •

edited

Loading

msfroh commented Sep 18, 2023 •

edited

Loading

nkumar04 commented Sep 18, 2023 •

edited

Loading