Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unaligned results of now() between streaming and batch queries when barrier piles up #15117

Open
lmatz opened this issue Feb 18, 2024 · 5 comments
Assignees
Labels
no-issue-activity type/bug Something isn't working

Comments

@lmatz
Copy link
Contributor

lmatz commented Feb 18, 2024

https://buildkite.com/risingwavelabs/main-cron/builds/1869#018da85a-078a-4811-a227-906f26b55713
SCR-20240218-v0y

It has passed for a few days now.
Not sure if it was solved a few days ago, or if it's a flaky test.

@lmatz lmatz added the type/bug Something isn't working label Feb 18, 2024
@github-actions github-actions bot added this to the release-1.7 milestone Feb 18, 2024
@BugenZhao BugenZhao self-assigned this Feb 19, 2024
@BugenZhao
Copy link
Member

Found such warnings in the log:

2024-02-14T16:18:41.846288325Z  WARN risingwave_stream::executor::now: handle multiple barriers at once in now executor: 2
2024-02-14T16:18:41.854057912Z  WARN risingwave_stream::executor::now: handle multiple barriers at once in now executor: 2
2024-02-14T16:18:42.006623223Z  WARN risingwave_stream::executor::now: handle multiple barriers at once in now executor: 3

So I guess this could be a side effect introduced by #13271. The Now executor now processes the barriers in batch (if possible), skipping the data updates for intermediate epochs. If we coincidentally pin a snapshot on an intermediate epoch, we'll get inconsistent results for NOW() between the streaming and batch queries.

I think this somehow breaks the design goal of RisingWave that streaming queries should always yield exactly the same results as the batch queries. cc @wenym1 @fuyufjh Would you please share your ideas?

@wenym1
Copy link
Contributor

wenym1 commented Feb 21, 2024

I think we can solve this issue by having a global singleton now executor and now state table. IIUC, currently we may have multiple now executors for different streaming jobs. If so, such inconsistency may exist not only between batch query and streaming query, but also between different streaming queries with different now executor instances.

If we have a global singleton now executor and state table, instead of using the epoch directly as the now timestamp, we can query the global now state table to get a globally unified timestamp, so that the query can be consistent globally.

To temporarily solve this issue, instead of directly using epoch as the timestamp, we can try to query the now state table if there is any one associated to mv in the batch query.

@BugenZhao
Copy link
Member

This sounds like a feasible approach. From my perspective, if we don't think it's really necessary to provide such strict guarantee, it's okay to just leave it as it is.

@BugenZhao
Copy link
Member

Temporarily disabled in #15157.

@BugenZhao BugenZhao changed the title failed to run e2e_test/batch/transaction/now.slt unaligned results of now() between streaming and batch queries when barrier piles up Mar 11, 2024
@BugenZhao BugenZhao removed this from the release-1.7 milestone Mar 11, 2024
Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-issue-activity type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants