-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discussion: allow stream query on creating mv? #12771
Comments
This comment was marked as outdated.
This comment was marked as outdated.
Also for user experience, I think we should allow stream query on creating mv, or he must wait for the mv backfill all the historical data. |
This comment was marked as outdated.
This comment was marked as outdated.
Personally I prefer the second one, because it is more practical and the concept of transactional ddl is too big? BTW, recoverable backfill is also needed in this case. We can return the ddl immediately to make creating mv visible to streaming query and batch query. But for batch query consistency, we need to block it until all its upstream backfilling mvs are finished. |
I think this sounds reasonable, with a session variable to configure it. For implementation, we just need to sync the catalog back to the frontend. I'm already planning to work on this part (syncing catalog to fe), so we can unify |
BTW, if we choose this approach, maybe we can return a notice on the second |
Btw we also need a synchronising mechanism, like |
+1 for this idea. Think a step further: (assuming MV2 depends on MV1 and both are creating)
Furthermore, |
We should only provide this feature on background ddl I suppose. Because in many cases, users are using DBT to handle creation of stream job DAG. For a normal stream job, we only return a response once it is done. If using background ddl, we will immediately return a response on firing the command. Then DBT can immediately continue to create the next MVs. |
For each MV we create, we now also need to watch its upstream MVs, and only mark its state as In terms of cancelling / dropping the streaming DAG, once we unify cancel / drop, we can reuse the cascade logic of |
IIUC, DBT driver has not do any special for it, so it does not use background ddl. Could and should it using background DDL by default? cc @chenzl25 |
I think we should not enable background DDL by default for DBT. DBT has different models, e.g. |
This issue has been open for 60 days with no activity. If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the You can also confidently close this issue as not planned to keep our backlog clean. |
Wait for snapshot backfill implemented, then we discuss the feasibility. There could be a lot of complexity in the stream manager to maintain stream job status. See:
|
offline discussed with @wyhyhyhyh
Currently, the creating MV is invisible and the user can not do batch or streaming query on it.
Complex data processing pipelines usually are layered which include many materialized views depending on each other. While creating a stream query on the existing MV, RW will backfill all the historical data in the upstream MV and union it with the coming changes.
Considering there are mvA, a mvB on mvA, and a mvC on mvB(S->MvA->MvB->MvC). Under the current design, the user must create the materialized views one by one. He should create the MvA and wait for all historical data to be backfilled in MvB, and only after that, he can create the mvB. But if the pipeline can be constructed at a time, we can see the backfilling is unnecessary.
We can achieve that in two ways.
The text was updated successfully, but these errors were encountered: