You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when users create a source and then create some Mvs on it, it will immediately consume data. However, in some circumstances, it will be better if we don't consume it first.
Case1:
Users may want to rebuild the entire streaming job without waiting for each individual Mv to be backfilled. To achieve this, they can create a source without consuming the data. Once the pipeline is built, they can then start consuming the source data.
Case2:
I have found that a user used temporal join to enforce some invariants for the output data. To ensure that the invariants are valid for all output, we need to ensure that there is no data consumption between Mv latest_b_per_kind and a_b. Otherwise, the temporal join may lookup future data.
SET streaming_parallelism =1;
CREATETABLEevents (seq bigint, event_type int, kind varchar) APPEND ONLY
WITH (
connector ='datagen',
fields.seq.kind ='sequence',
fields.seq.end ='9223372036854775807',
fields.event_type.kind ='random',
fields.event_type.min ='1',
fields.event_type.max ='2',
fields.event_type.seed =1,
fields.kind.kind ='random',
fields.kind.length =2,
fields.kind.seed =1,
datagen.rows.per.second=10000
) FORMAT PLAIN ENCODE JSON;
CREATE MATERIALIZED VIEW IF NOT EXISTS a AS (
SELECT
DISTINCT ON (seq) seq,
kind
FROM
events
WHERE
event_type =1
);
CREATE MATERIALIZED VIEW IF NOT EXISTS b AS (
SELECT
DISTINCT ON (seq) seq,
kind
FROM
events
WHERE
event_type =2
);
CREATE MATERIALIZED VIEW IF NOT EXISTS latest_b_per_kind AS (
SELECT
kind,
seq
FROM
(
SELECT
kind,
seq,
row_number() OVER (
PARTITION BY kind
ORDER BY
seq DESC
) as rank
FROM
b
)
WHERE
rank =1
);
CREATE MATERIALIZED VIEW IF NOT EXISTS a_b AS (
SELECTa.seqas a_seq,
a.kindas kind,
latest_b_per_kind.seqas b_seq
FROM
a
LEFT JOIN latest_b_per_kind FOR SYSTEM_TIME AS OF PROCTIME() ONa.kind=latest_b_per_kind.kind
);
Check invariant:
SELECTcount(1) from a_b where b_seq is not nulland a_seq < b_seq;
The text was updated successfully, but these errors were encountered:
Currently, when users create a source and then create some Mvs on it, it will immediately consume data. However, in some circumstances, it will be better if we don't consume it first.
Case1:
Users may want to rebuild the entire streaming job without waiting for each individual Mv to be backfilled. To achieve this, they can create a source without consuming the data. Once the pipeline is built, they can then start consuming the source data.
Case2:
I have found that a user used temporal join to enforce some invariants for the output data. To ensure that the invariants are valid for all output, we need to ensure that there is no data consumption between Mv latest_b_per_kind and a_b. Otherwise, the temporal join may lookup future data.
Check invariant:
The text was updated successfully, but these errors were encountered: