Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Joins] Source Reader implementation for Joins #15874

Open
Tracked by #15451
harshavamsi opened this issue Sep 9, 2024 · 0 comments
Open
Tracked by #15451

[Joins] Source Reader implementation for Joins #15874

harshavamsi opened this issue Sep 9, 2024 · 0 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance

Comments

@harshavamsi
Copy link
Contributor

Is your feature request related to a problem? Please describe

As part of milestone 1 for #15185, we plan on introducing a source reader abstraction for join operations.

Describe the solution you'd like

Copy pasting from #15185

Purpose
Read rows from the data source. It can make use of an index or simply scan of all rows depending on the query passed to it. It doesn’t work on optimizing the query but blindly executes the query passed to it at the time of initialization. It must support pagination and producing rows in batched manner efficiently.

For lucene based implementation, SourceReader will have access to the corresponding shard, which is a lucene index, and will execute the given lucene query. It will make use of customized Collector to collect documents and generate rows with docID (optionally) and desired fields to fetch.

Properties
Type: Lucene
Source identifier: Shard ID
Input
Query: lucene query for lucene based implementation
Pagination info: page size
Fields: fields to fetch
Output
Iterator of matching rows. A row is a tuple of <docID, f1, f2, f3>. Output here is non-serialized version of iterator, for java implementation it will be a new Iterator class object with ability like nextPage() which will fetch all rows in next page.
Note: It is the responsibility of stream to consume this iterator and perform serialization to send it over network if needed.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

@harshavamsi harshavamsi added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 9, 2024
@harshavamsi harshavamsi changed the title [Feature Request] Source Reader implementation for Joins [Joins] Source Reader implementation for Joins Sep 9, 2024
@harshavamsi harshavamsi self-assigned this Sep 9, 2024
@msfroh msfroh removed the untriaged label Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
Status: Todo
Status: 🆕 New
Development

No branches or pull requests

2 participants