You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now the merge reader buffers all batches for the same primary key (timeseries) in memory and sorts it. When the timeseries has too much data, the reader may allocate too much memory and be very slow to return the first batch.
We can return a batch early if we have buffered enough rows for that key.
Implementation challenges
This needs to use a new merge algorithm to implement the reader, which is similar to what we use in the old storage engine.
The reader maintains two heaps to sort input batches: hot and cold. The min/max keys of the root node in the hot heap form the current merge window. Nodes in the hot heap have key ranges overlapping with the merge window. The cold heap contains nodes not overlapping with the merge window.
If the hot heap contains only one node, we can output the batch directly and then rebuild the heap.
If the hot heap contains more than one node, we output non-overlapping timestamps.
Then remove the duplicate timestamp and rebuild the heap.
What type of enhancement is this?
Tech debt reduction
What does the enhancement do?
Now the merge reader buffers all batches for the same primary key (timeseries) in memory and sorts it. When the timeseries has too much data, the reader may allocate too much memory and be very slow to return the first batch.
We can return a batch early if we have buffered enough rows for that key.
Implementation challenges
This needs to use a new merge algorithm to implement the reader, which is similar to what we use in the old storage engine.
The reader maintains two heaps to sort input batches:
hot
andcold
. The min/max keys of the root node in thehot
heap form the currentmerge window
. Nodes in thehot
heap have key ranges overlapping with themerge window
. Thecold
heap contains nodes not overlapping with themerge window
.If the
hot
heap contains only one node, we can output the batch directly and then rebuild the heap.If the
hot
heap contains more than one node, we output non-overlapping timestamps.Then remove the duplicate timestamp and rebuild the heap.
Reference
The text was updated successfully, but these errors were encountered: