Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(over window): window function RANGE frame support - backend part #14416

Merged
merged 9 commits into from
Feb 4, 2024

Conversation

stdrc
Copy link
Member

@stdrc stdrc commented Jan 8, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR implements the real RANGE frame support in backend executors. Resolves #11109.

The basic idea to support RANGE frame in find_affected_ranges is to:

  1. Calculate the logical order column values of the first and last current row/key for a given set of delta;
  2. Try to find the cache keys that are closest to the logical order values calculated in step 1;
  3. Compare with the first/last current key of ROWS frames, choose the minimum/maximum one as the final first/last current key;
  4. Calculate the logical order column values of the first frame start and last frame end, similar to step 1;
  5. Try to find the closest cache keys, similar to step 2;
  6. Compare with the first/last frame start/end of ROWS frames, choose the minimum/maximum one, similar to step 3.

The basic idea to support RANGE frame in WindowBuffer is to:

  1. Calculate the logical order column values of the frame start and frame end for a given current key;
  2. Try to find the closest ones in the buffer.

118x out of 17xx lines of addition are tests, reviewers plz don't panic.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

  • Support RANGE frames in window function calls.

@stdrc stdrc force-pushed the rc/range-frame-backend branch from 18dd81c to ec57ab0 Compare January 9, 2024 08:13
@stdrc stdrc force-pushed the rc/refactor-over-window-rows-frame branch from 513a2a6 to cb73122 Compare January 9, 2024 09:37
@stdrc stdrc force-pushed the rc/range-frame-backend branch from ec57ab0 to 5453356 Compare January 9, 2024 09:45
@stdrc stdrc changed the base branch from rc/refactor-over-window-rows-frame to rc/expose-order-type-direction January 9, 2024 09:46
@stdrc stdrc force-pushed the rc/expose-order-type-direction branch from 7c7f9f9 to 61df00f Compare January 9, 2024 09:59
@stdrc stdrc force-pushed the rc/range-frame-backend branch from 5453356 to 17eb72f Compare January 9, 2024 10:03
@stdrc stdrc force-pushed the rc/expose-order-type-direction branch from 61df00f to 2e3ec9a Compare January 9, 2024 10:32
@stdrc stdrc force-pushed the rc/range-frame-backend branch from 17eb72f to c365bd0 Compare January 9, 2024 10:35
@stdrc stdrc changed the base branch from rc/expose-order-type-direction to rc/easier-display-for-frame-bounds January 10, 2024 03:42
@stdrc stdrc force-pushed the rc/range-frame-backend branch 2 times, most recently from 49ea439 to 5c88fec Compare January 10, 2024 04:45
Copy link
Member Author

stdrc commented Jan 10, 2024

@stdrc stdrc force-pushed the rc/range-frame-backend branch 2 times, most recently from 84b2e41 to a16d90d Compare January 10, 2024 05:28
@stdrc stdrc changed the base branch from rc/easier-display-for-frame-bounds to rc/extract-range-utils January 10, 2024 05:28
@stdrc stdrc force-pushed the rc/extract-range-utils branch from 474d9d6 to 4e1a4dd Compare January 10, 2024 05:48
@stdrc stdrc force-pushed the rc/range-frame-backend branch from a16d90d to 845c28d Compare January 10, 2024 05:48
@stdrc stdrc changed the base branch from rc/extract-range-utils to rc/easier-display-for-frame-bounds January 10, 2024 05:48
@stdrc stdrc force-pushed the rc/easier-display-for-frame-bounds branch from c894a25 to 3fad06a Compare January 10, 2024 06:39
@stdrc stdrc force-pushed the rc/range-frame-backend branch from 845c28d to 9845051 Compare January 10, 2024 06:39
@stdrc stdrc force-pushed the rc/easier-display-for-frame-bounds branch from 3fad06a to 08e5e77 Compare January 10, 2024 07:26
@stdrc stdrc force-pushed the rc/range-frame-backend branch from 9845051 to 90b837c Compare January 10, 2024 07:26
@stdrc stdrc force-pushed the rc/easier-display-for-frame-bounds branch from 08e5e77 to 14d0113 Compare January 10, 2024 07:38
@stdrc stdrc force-pushed the rc/range-frame-backend branch from 90b837c to 65b6f9a Compare January 10, 2024 07:38
@stdrc stdrc force-pushed the rc/easier-display-for-frame-bounds branch from 14d0113 to cf16e2d Compare January 10, 2024 08:08
@stdrc stdrc force-pushed the rc/range-frame-backend branch from 26d6ca8 to 10df303 Compare February 1, 2024 12:15
@stdrc stdrc force-pushed the rc/range-frame-frontend branch from 538c8f4 to 46f031e Compare February 2, 2024 05:06
@stdrc stdrc force-pushed the rc/range-frame-backend branch from 10df303 to 6633942 Compare February 2, 2024 05:06
@stdrc stdrc force-pushed the rc/range-frame-frontend branch from 46f031e to 8566755 Compare February 2, 2024 05:20
@stdrc stdrc force-pushed the rc/range-frame-backend branch from 6633942 to f029b94 Compare February 2, 2024 05:21
@stdrc stdrc force-pushed the rc/range-frame-backend branch from f029b94 to f4e4e7b Compare February 2, 2024 07:36
Preceding(offset) => Following(offset),
CurrentRow => CurrentRow,
Following(offset) => Preceding(offset),
UnboundedFollowing => UnboundedPreceding,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that FrameBound<usize> and FrameBound<RangeFrameOffset> are so different that they don't really need to be variants of same type. How about just saparating them as 2 types, such as RowsFrameBound and RangeFrameBound?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO although they have many different methods and are used very differently, in their core they are the same concept, so I decided to introduce a generic type. Also I didn't want to repeat the 5 enum variants for two times (will be 3 times when we support GROUPS frame later).

Maybe we can type RowsFrameBound = FrameBound<usize>; to make it easier to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing type aliases sounds good

}

#[test]
fn test_calc_logical_for_timestamp_desc_nulls_first() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp format seemed to be supported actually, but why it's disabled in the planner test?

Copy link
Member Author

@stdrc stdrc Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp and timestamptz are supported. While for timestamptz, we only support offset to be interval without month and day fields, because that requires timezone info to do calculation and is not currently supported by our Add(Timestamptz, Interval) expression.

@@ -143,6 +147,73 @@ pub(super) fn find_frame_end_for_rows_frame<'cache>(
find_boundary_for_rows_frame::<false /* RIGHT */>(frame_bounds, part_with_delta, curr_key)
}

/// Given the first and last key in delta, calculate the order values of the first
/// and the last frames logically affected by some `RANGE` frames.
pub(super) fn calc_logical_curr_for_range_frames(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed to understand the function, but didn't quite get the point of "logical"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "logical (current key)" I mean the current key will be affected if it exists, but in actual we may not find the logical current key in the cache + delta, hence consider the closest one as the "actual current key".

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@stdrc stdrc force-pushed the rc/range-frame-backend branch from f4e4e7b to 392a6a1 Compare February 2, 2024 09:55
@stdrc stdrc force-pushed the rc/range-frame-frontend branch from ee5fa37 to 46f45dd Compare February 4, 2024 09:01
@stdrc stdrc force-pushed the rc/range-frame-backend branch from 392a6a1 to 4f9b76f Compare February 4, 2024 09:02
Base automatically changed from rc/range-frame-frontend to main February 4, 2024 09:33
stdrc added 9 commits February 4, 2024 22:39
Signed-off-by: Richard Chien <[email protected]>

add comment

Signed-off-by: Richard Chien <[email protected]>

order type little things

Signed-off-by: Richard Chien <[email protected]>

`fnd_affected_ranges` seems to work now

Signed-off-by: Richard Chien <[email protected]>

typo

Signed-off-by: Richard Chien <[email protected]>

typo

Signed-off-by: Richard Chien <[email protected]>

typo

Signed-off-by: Richard Chien <[email protected]>

simplify frame start/end calculation

Signed-off-by: Richard Chien <[email protected]>

minor

Signed-off-by: Richard Chien <[email protected]>

check range frame bounds

Signed-off-by: Richard Chien <[email protected]>

add comment

Signed-off-by: Richard Chien <[email protected]>

display RangeFrameBounds

Signed-off-by: Richard Chien <[email protected]>

make `WindowBuffer` only handle `ROWS` frame

Signed-off-by: Richard Chien <[email protected]>

reorder methods

Signed-off-by: Richard Chien <[email protected]>

unify

Signed-off-by: Richard Chien <[email protected]>

allow end-2-end range frame queries

Signed-off-by: Richard Chien <[email protected]>

minor

Signed-off-by: Richard Chien <[email protected]>

use real order key data type and order type

Signed-off-by: Richard Chien <[email protected]>

store order data type and order type in `RangeFrameBounds`

Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
@stdrc stdrc force-pushed the rc/range-frame-backend branch from 4f9b76f to 4594b29 Compare February 4, 2024 14:39
@stdrc stdrc enabled auto-merge February 4, 2024 14:40
@stdrc stdrc added this pull request to the merge queue Feb 4, 2024
Merged via the queue into main with commit ed643b8 Feb 4, 2024
27 of 28 checks passed
@stdrc stdrc deleted the rc/range-frame-backend branch February 4, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support over window function with RANGE frame
4 participants