Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run DDL in the background #8051

Closed
chenzl25 opened this issue Feb 20, 2023 · 8 comments · Fixed by #12167
Closed

Run DDL in the background #8051

chenzl25 opened this issue Feb 20, 2023 · 8 comments · Fixed by #12167

Comments

@chenzl25
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Some users want to run DDLs (usually creating a materialized view) in the background. Now we have provided a system table to show the progress of DDLs. If we plan to
To run DDLs in the background, once these DDLs are validated, our system needs to guarantee its completeness (even if the cluster is down during execution of the DDL) unless a cancel statement is issued.
It means we need to persist the materialized view meta (invisible first) and its backfill progress.If backfilling is finished, we can make the materialized view visible.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@github-actions github-actions bot added this to the release-0.1.18 milestone Feb 20, 2023
@chenzl25 chenzl25 removed this from the release-0.1.18 milestone Feb 20, 2023
@BugenZhao
Copy link
Member

@TennyZhuang
Copy link
Contributor

I think it shouldn't be a SQL feature.

Not only the DDLs may be slow, but also DMLs or DQLs, e.g.

  • INSERT ... SELECT ... with a huge table.
  • A batch query with large table scans and several joins.

I proposed to introduce a new background SQL execution mode.

What's the difference?

The normal SQL execution mode:

  1. User submitted the query
  2. Query returned the result set and completed.

The background SQL execution mode:

  1. User submitted the query
  2. The server persisted the query and returned ACK.
  3. The server ran the query in the background.
  4. User retrieved the resultset by polling.

@TennyZhuang
Copy link
Contributor

Anyway, at current time, only allow DDLs to be executed in the background mode is accepted to me.

@chenzl25 chenzl25 changed the title Run DDL in background Run DDL in the background Feb 22, 2023
@liurenjie1024
Copy link
Contributor

Another thing is fault tolerance of background executions. For me, it would be more natural to make it fault tolerant.

@kwannoel
Copy link
Contributor

@yezizp2012 are you planning to take this up as an extension of #8145 ? If not I'd like to work on it.

@yezizp2012
Copy link
Member

@yezizp2012 are you planning to take this up as an extension of #8145 ? If not I'd like to work on it.

Not really, just feel free to assign yourself and take it. 😊

@kwannoel kwannoel self-assigned this Feb 27, 2023
@github-actions
Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@kwannoel
Copy link
Contributor

kwannoel commented May 23, 2023

Future changes for ddl in background after #9752 is merged:


  • We can ban scaling-in ddl which is undergoing backfill.
  • That way we just need to persist state, which is good enough.
  • Need to make sure to delete meta store state, if job is cancelled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants