non-append-only distinct may output adjacent noop updates #17030
Labels
component/streaming
Stream processing related issue.
type/enhancement
Improvements to existing implementation.
Milestone
Using
DISTINCT
on a non-append-only relation will generate a plan like this:If we insert an existing value into the table, we'll still get a chunk with adjacent noop updates in
StreamMaterialize
.If the output is further used as a dimension table in
Join
, the noop update will cause an amplification per-row, resulting in extremely high latency. Note that this does not only happen onDISTINCT
,LATERAL JOIN
could also generate a plan like this.This is mainly because we add an extra
row_count
agg-call for internal use, which is then stripped with the followingProject
. We are missing the optimization here:risingwave/src/stream/src/executor/aggregation/agg_group.rs
Lines 72 to 74 in bb6d16b
Also, if there's no column-pruning dispatchers (typically in MV on MV), we'll also miss the optimization in #14652.
risingwave/src/stream/src/executor/dispatch.rs
Lines 758 to 765 in bb6d16b
Following the idea of #10949, I'm considering whether we should apply the optimization to
Project
s, or at least specifically theProject
in the case of this issue.The text was updated successfully, but these errors were encountered: