-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(storage): allow dynamic enable/disable minitrace for tiered cache #17433
Conversation
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Discussed offline with @tabVersion and @BugenZhao . The jaeger exporter will be replaced with OpenTelemetry exporter and reuse the env of the distributed tracing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the system parameter part.
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: MrCroxx <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and set system parameter
enable_tracing
astrue
.
This doesn't seem to be implemented? Shall we also handle the change on system parameter enable_tracing
for minitrace?
risingwave/src/common/src/system_param/common.rs
Lines 35 to 38 in a931b1e
// Toggle the distributed tracing layer. | |
if let Some(enabled) = diff.enable_tracing { | |
toggle_otel_layer(enabled); | |
} |
optional uint32 tracing_threshold_tiered_cache_read_ms = 18; | ||
optional uint32 tracing_threshold_tiered_cache_write_ms = 19; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would suggest making them configuration entries (like under [storage.cache.developer]
) instead of system parameters, since it looks to be less user-facing and does not require frequent online changes. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it looks to be less user-facing and does not require frequent online changes
+1. But the thing is, tail-based tracing is supposed to be enabled for only a few seconds to capture abnormal spans in the current state. Using a config file requires a restart, which may destroy the scene.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply use default values that are small enough and rely only on enable_tracing
to toggle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe enable/disable with RPC is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like we do now to capture await tree and other profiling data, add a RPC function to control the tracing configurations.
risingwave/proto/monitor_service.proto
Lines 68 to 75 in 3c102d5
service MonitorService { | |
rpc StackTrace(StackTraceRequest) returns (StackTraceResponse); | |
rpc Profiling(ProfilingRequest) returns (ProfilingResponse); | |
rpc HeapProfiling(HeapProfilingRequest) returns (HeapProfilingResponse); | |
rpc ListHeapProfiling(ListHeapProfilingRequest) returns (ListHeapProfilingResponse); | |
rpc AnalyzeHeap(AnalyzeHeapRequest) returns (AnalyzeHeapResponse); | |
rpc GetBackPressure(GetBackPressureRequest) returns (GetBackPressureResponse); | |
} |
.unwrap() | ||
.unwrap(); | ||
|
||
let exporter = opentelemetry_otlp::new_exporter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between this exporter with the one in L449? Can we reuse them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface requires a builder. 🥹
let reporter = OpenTelemetryReporter::new( | ||
exporter, | ||
SpanKind::Server, | ||
Cow::Owned(Resource::new([KeyValue::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resource can be reused as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The service name should be different from the distributed tracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Is it because there is currently no way to maintain the parent relationship between tracing spans and minitrace spans? So we'd rather make them not interleaving with each other by assigning a different service name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emmm, nop. It is because I think it is not a good idea to assign the distributed tracing and lib-level tracing in the same service name. 👀 Separating them can give a clearer view.
Discussed with @BugenZhao offline, I'll switch to rpc with risectl. |
Signed-off-by: MrCroxx [email protected]I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Introduce system parameters to dynamically enable/disable and configure minitrace for tiered cache.
To enable tracing, set opentelemetry endpoint via env
RW_TRACING_ENDPOINT
and set system parameterenable_tracing
astrue
.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.