-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(expr): distributed make timestamptz #13647
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #13647 +/- ##
==========================================
- Coverage 68.21% 68.21% -0.01%
==========================================
Files 1523 1524 +1
Lines 262264 262330 +66
==========================================
+ Hits 178902 178938 +36
- Misses 83362 83392 +30
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work! Generally LGTM.
proto/batch_plan.proto
Outdated
@@ -359,3 +359,7 @@ message PlanFragment { | |||
PlanNode root = 1; | |||
ExchangeInfo exchange_info = 2; | |||
} | |||
|
|||
message CapturedContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the CapturedContext
message is defined in batch_plan.proto
, while in src/expr/core/src/captured_context.rs
, it's said to be "For all execution mode".
So shall we move this to somewhere else like expr.proto
or compute.proto
or common.proto
?
Btw, please add some comments for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CaptureContext is not a good name here. We won't pass every captured context to CN (it's impossible).
A concrete name may be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm about to move it stream_plan
as we also need this for streaming execution mode.
For naming, what about SessionContext
? As this struct is mainly used to capture various sessional values for various execution modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to name it ExecutionContext
since it's about one query execution. It may include things even more limited to one execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExecutionContext
is good, but it conflicts with an exsiting struct in frontend codes. I'm concerning it will cause confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or ExecutionContextInfo
? In fact I expect the existing struct should contains such info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context
and Info
together seems to be a little redundant. Maybe CapturedExecutionContext
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm about to move it
stream_plan
as we also need this for streaming execution mode.
Why not plan_common.proto
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂It‘s actually not used in batch_plan.proto
, so stream_plan.proto
is enough. But plan_common.proto
make more sense. I'll move it to plan_common.proto
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, the contexts were all previously defined in the frontend
crate because...
- We are sure they're evaluated in the frontend (local mode), as they're mostly admin functions.
- The context types are only available in frontend crate.
According to those, it seems reasonable to extract the TIME_ZONE
context out of frontend
and into expr
crate. As a result, we have to find all paths to provide the context before evaluating. However, I'm considering whether there's really a strong reason for not inlining the timezone into the expression itself (as a constant argument). This is how we handle at_timezone
and now
before, which is really easy to implement and not invasive at all.
IIUC, simply adding a match arm of |
Yes, current rewrite machism is friendly to expr part but cause some problems on frontend's logics. Thanks to @xiangjinwu for helping find the following issues caused by current rewrite mechanism: #8415 #7175 (comment) #10086 #12633 (comment) #10410 In summary, current rewrite mechanism introduce challenges on frontend rewrite. It's easy to miss or incorrectly perform the rewrite. That's why I'm now trying to redesign the implementation of timezone, starting with This PR's implementation is somehow tricky but avoids the error-prone rewrite. We can discuss which is better. |
b000219
to
42e6765
Compare
42e6765
to
8736885
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for establishing the new route!
|
||
let ctx = quote! { let ctx = #context; }; | ||
let mut body = quote! { #closure }; | ||
let fields = vec![format_ident!("time_zone")]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems here is some space for improvement, but we can consider them later:
- Better not hard coding
time_zone
, and find a way to "reflect" on all fields - Or it can be generalized to accept a predefined collection of contexts, so that
src/frontend/src/scheduler/local.rs
can also benefit from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it to the streaming mode PR.
let exec = captured_execution_context_scope!( | ||
captured_execution_context.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a result, we have to find all paths to provide the context before evaluating. ...(the frontend inline rewriting approach) is really easy to implement and not invasive at all.
We either need to find all paths to provide the context before evaluating, or find all paths to inline the context before evaluating. The primary goal of these make_timestamptz
PRs are not to support this new function for users, but to be the pioneer and see if the new approach is better than existing one.
@KeXiangWang How did you figure out these places to provide context in this PR? If some of them slip through, how easy are they exposed as a runtime error? My impression was that there are less places and easier to surface compared to the expr rewrite approach (recent addition #13724). But this can be wrong and simply due to we had more time to receive bugs on the rewrite approach.
I also agree inlining is less invasive, to the extend that it is solely in frontend. But if approach A requires awareness in 10 submodules in 1 module, while approach B requires awareness in 2 submodules in 3 parent modules each, it may also be acceptable to consider A as more invasive because 10 > 2 * 3. Again these numbers are made up arbitrarily and personally I feel the difference between 10 and 6 is not large enough to justify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the codes in frontend and run some test examples to help verify whether all of them are found.
The good thing is when there's an expression been evaluated without context, it will throw an error like "context TIME_ZONE not found" and the error is shown in psql console, making it easy to recognize. For streaming mode, it will generate one log with the same error message and when we read from the MV, the result of that row is missed.
Another benefit is that we only need to find all the paths (to provide the context before evaluating) once. Once it's finished correctly with make_timestamptz
, unless we have new paths, we don't need to care it in following the implementation of other expressions.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Support distributed
make_timestamptz
.CapturedContext
is introduced for pass session info like timezone.There will be two part where we need to utilized the captured context:
execute
RPC intask_service
to create a task on other CN as data source. At this time, we need to the capture these values and pass these values to the CN throught theexecute
RPC.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.