[Joins] Source Reader implementation for Joins #15874

harshavamsi · 2024-09-09T22:00:58Z

Is your feature request related to a problem? Please describe

As part of milestone 1 for #15185, we plan on introducing a source reader abstraction for join operations.

Describe the solution you'd like

Copy pasting from #15185

Purpose
Read rows from the data source. It can make use of an index or simply scan of all rows depending on the query passed to it. It doesn’t work on optimizing the query but blindly executes the query passed to it at the time of initialization. It must support pagination and producing rows in batched manner efficiently.

For lucene based implementation, SourceReader will have access to the corresponding shard, which is a lucene index, and will execute the given lucene query. It will make use of customized Collector to collect documents and generate rows with docID (optionally) and desired fields to fetch.

Properties
Type: Lucene
Source identifier: Shard ID
Input
Query: lucene query for lucene based implementation
Pagination info: page size
Fields: fields to fetch
Output
Iterator of matching rows. A row is a tuple of <docID, f1, f2, f3>. Output here is non-serialized version of iterator, for java implementation it will be a new Iterator class object with ability like nextPage() which will fetch all rows in next page.
Note: It is the responsibility of stream to consume this iterator and perform serialization to send it over network if needed.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

harshavamsi added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 9, 2024

github-actions bot added the Search:Performance label Sep 9, 2024

github-project-automation bot added this to Search Project Board Sep 9, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Sep 9, 2024

harshavamsi changed the title ~~[Feature Request] Source Reader implementation for Joins~~ [Joins] Source Reader implementation for Joins Sep 9, 2024

harshavamsi added this to Performance Roadmap Sep 9, 2024

harshavamsi self-assigned this Sep 9, 2024

github-project-automation bot moved this to Todo in Performance Roadmap Sep 9, 2024

msfroh removed the untriaged label Sep 11, 2024

harshavamsi mentioned this issue Sep 16, 2024

[META] Native Join support in OpenSearch #15451

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Joins] Source Reader implementation for Joins #15874

[Joins] Source Reader implementation for Joins #15874

harshavamsi commented Sep 9, 2024

[Joins] Source Reader implementation for Joins #15874

[Joins] Source Reader implementation for Joins #15874

Comments

harshavamsi commented Sep 9, 2024

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Copy pasting from #15185

Related component

Describe alternatives you've considered

Additional context