Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_concurrent_creating_streaming_jobs defaulting to 1 can be confusing #15788

Open
BugenZhao opened this issue Mar 19, 2024 · 4 comments
Open
Labels
component/frontend Protocol, parsing, binder. component/meta Meta related issue. type/enhancement Improvements to existing implementation.
Milestone

Comments

@BugenZhao
Copy link
Member

BugenZhao commented Mar 19, 2024

In #11601 we introduced a system parameter named max_concurrent_creating_streaming_jobs and set the default value to 1 (!). This could be really confusing some cases. Consider the following scenario:

  • A user issues a DDL for creating MV on MV, which may take long to complete.
  • The user finds no response for the DDL after a while, then suspecting if the system is still going well.
  • The user may try creating a new empty table to verify that. It'll be blocked by the default value of max_concurrent_creating_streaming_jobs but the user will definitely interpret it as the stuck of the system.

Ideas on how we can improve this:

  • Notice users about the internal steps of DDL through psql:
     dev=# create materialized view ...;
     
     building actors... done
     collecting initial barriers... done
     awaiting backfill completion...
    
  • Notice users if a DDL is blocked by concurrency limit:
     dev=# create materialized view ...;
     
     acquiring permit for DDL concurrency (currently set to 1)...
    
  • Investigate whether it's reasonable to make max_concurrent_creating_streaming_jobs default to 1.
@BugenZhao BugenZhao added type/enhancement Improvements to existing implementation. component/meta Meta related issue. component/frontend Protocol, parsing, binder. labels Mar 19, 2024
@github-actions github-actions bot added this to the release-1.8 milestone Mar 19, 2024
@BugenZhao BugenZhao changed the title max_concurrent_creating_streaming_jobs default to 1 can be confusing max_concurrent_creating_streaming_jobs defaulting to 1 can be confusing Mar 19, 2024
@xxchan
Copy link
Member

xxchan commented Mar 19, 2024

One thing is that this is not observable at all, so even developers can have no idea what happened. A simple improvement idea is that we can at least try_acquire and then fire a WARN log on meta (Or more aggressively, directly return error to fail the DDL?).

@yezizp2012
Copy link
Member

yezizp2012 commented Mar 19, 2024

+1 for directly return error to fail the DDL. Found another potential issue, seems like the blocked DDL could not be cancelled by ctrl+c nor cancel command. Returning error can cover it. 🥵

@BugenZhao BugenZhao self-assigned this Mar 20, 2024
@xxchan
Copy link
Member

xxchan commented Mar 20, 2024

One thing is that this is not observable at all

At the same time, I'm also thinking that is there any general way to increase observability and debuggability (besides print driven debug..)?

I tried lldb and tokio-console. They both can't help. I think it's because after a tokio task becomes pending, it's not observable from the threads. For tokio-console, it only knows the entry point of tasks, without stack traces.

await-tree needs manual instrument, so it looks very like (a nicer version of) print debug for me.

https://docs.rs/tokio/latest/tokio/runtime/struct.Handle.html#method.dump This might be interesting to check

Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/frontend Protocol, parsing, binder. component/meta Meta related issue. type/enhancement Improvements to existing implementation.
Projects
None yet
Development

No branches or pull requests

3 participants