Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid buffering all rows for the same primary key in merge reader #2660

Closed
evenyag opened this issue Oct 26, 2023 · 0 comments
Closed

Avoid buffering all rows for the same primary key in merge reader #2660

evenyag opened this issue Oct 26, 2023 · 0 comments
Assignees
Labels
C-enhancement Category Enhancements

Comments

@evenyag
Copy link
Contributor

evenyag commented Oct 26, 2023

What type of enhancement is this?

Tech debt reduction

What does the enhancement do?

Now the merge reader buffers all batches for the same primary key (timeseries) in memory and sorts it. When the timeseries has too much data, the reader may allocate too much memory and be very slow to return the first batch.

We can return a batch early if we have buffered enough rows for that key.

Implementation challenges

This needs to use a new merge algorithm to implement the reader, which is similar to what we use in the old storage engine.

The reader maintains two heaps to sort input batches: hot and cold. The min/max keys of the root node in the hot heap form the current merge window. Nodes in the hot heap have key ranges overlapping with the merge window. The cold heap contains nodes not overlapping with the merge window.

image

If the hot heap contains only one node, we can output the batch directly and then rebuild the heap.

image

If the hot heap contains more than one node, we output non-overlapping timestamps.
image

Then remove the duplicate timestamp and rebuild the heap.
image

Reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements
Projects
None yet
Development

No branches or pull requests

2 participants