Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: sql-backend meta node panics when there are concurrent ALTER TABLE ADD COLUMN requests #17093

Closed
BugenZhao opened this issue Jun 4, 2024 · 0 comments · Fixed by #17097
Closed
Assignees
Labels
component/meta Meta related issue. type/bug Something isn't working
Milestone

Comments

@BugenZhao
Copy link
Member

Describe the bug

SQL-backend meta node panics when there are concurrent ALTER TABLE ADD COLUMN requests.

Error message/log

thread 'rw-main' panicked at src/meta/src/barrier/info.rs:162:22:
actor not found
stack backtrace:

0: rust_begin_unwind
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/core/src/panicking.rs:72:14
   2: core::panicking::panic_display
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/core/src/panicking.rs:197:5
   3: core::panicking::panic_str
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/core/src/panicking.rs:172:5
   4: core::option::expect_failed
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/core/src/option.rs:1995:5
   5: core::option::Option<T>::expect
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/core/src/option.rs:896:21
   6: risingwave_meta::barrier::info::InflightActorInfo::post_apply
             at ./src/meta/src/barrier/info.rs:159:31
   7: risingwave_meta::barrier::state::BarrierManagerState::apply_command
             at ./src/meta/src/barrier/state.rs:81:9
   8: risingwave_meta::barrier::GlobalBarrierManager::handle_new_barrier
             at ./src/meta/src/barrier/mod.rs:725:20
   9: risingwave_meta::barrier::GlobalBarrierManager::run::{{closure}}
             at ./src/meta/src/barrier/mod.rs:706:37
  10: risingwave_meta::barrier::GlobalBarrierManager::start::{{closure}}
             at ./src/meta/src/barrier/mod.rs:497:46
  11: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/core/src/future/future.rs:123:9
  12: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-0.1.40/src/instrument.rs:321:9
  13: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/core.rs:328:17
  14: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/loom/std/unsafe_cell.rs:16:9
  15: tokio::runtime::task::core::Core<T,S>::poll
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/core.rs:317:13
  16: tokio::runtime::task::harness::poll_future::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/harness.rs:485:19
  17: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/core/src/panic/unwind_safe.rs:272:9
  18: std::panicking::try::do_call
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/panicking.rs:552:40
  19: ___rust_try
  20: std::panicking::try
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/panicking.rs:516:19
  21: std::panic::catch_unwind
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/panic.rs:146:14
  22: tokio::runtime::task::harness::poll_future
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/harness.rs:473:18
  23: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/harness.rs:208:27
  24: tokio::runtime::task::harness::Harness<T,S>::poll
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/harness.rs:153:15
  25: tokio::runtime::task::raw::poll
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/raw.rs:271:5
  26: tokio::runtime::task::raw::RawTask::poll
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/raw.rs:201:18
  27: tokio::runtime::task::LocalNotified<S>::run
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/mod.rs:427:9
  28: tokio::runtime::scheduler::multi_thread::worker::Context::run_task::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/worker.rs:639:17
  29: tokio::runtime::coop::with_budget
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:107:5
  30: tokio::runtime::coop::budget
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:73:5
  31: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/worker.rs:575:9
  32: tokio::runtime::scheduler::multi_thread::worker::Context::run
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/worker.rs:526:24
  33: tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/worker.rs:491:21
  34: tokio::runtime::context::scoped::Scoped<T>::set
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/scoped.rs:40:9
  35: tokio::runtime::context::set_scheduler::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context.rs:176:26
  36: std::thread::local::LocalKey<T>::try_with
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/thread/local.rs:284:16
  37: std::thread::local::LocalKey<T>::with
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/thread/local.rs:260:9
  38: tokio::runtime::context::set_scheduler
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context.rs:176:9
  39: tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/worker.rs:486:9
  40: tokio::runtime::context::runtime::enter_runtime
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/runtime.rs:65:16
  41: tokio::runtime::scheduler::multi_thread::worker::run
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/worker.rs:478:5
  42: tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/worker.rs:447:45
  43: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/blocking/task.rs:42:21
  44: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-0.1.40/src/instrument.rs:321:9
  45: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/core.rs:328:17
  46: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/loom/std/unsafe_cell.rs:16:9
  47: tokio::runtime::task::core::Core<T,S>::poll
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/core.rs:317:13
  48: tokio::runtime::task::harness::poll_future::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/harness.rs:485:19
  49: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/core/src/panic/unwind_safe.rs:272:9
  50: std::panicking::try::do_call
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/panicking.rs:552:40
  51: ___rust_try
  52: std::panicking::try
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/panicking.rs:516:19
  53: std::panic::catch_unwind
             at /rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/panic.rs:146:14
  54: tokio::runtime::task::harness::poll_future
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/harness.rs:473:18
  55: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/harness.rs:208:27
  56: tokio::runtime::task::harness::Harness<T,S>::poll
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/harness.rs:153:15
  57: tokio::runtime::task::raw::poll
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/raw.rs:271:5
  58: tokio::runtime::task::raw::RawTask::poll
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/raw.rs:201:18
  59: tokio::runtime::task::UnownedTask<S>::run
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/mod.rs:464:9
  60: tokio::runtime::blocking::pool::Task::run
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/blocking/pool.rs:159:9
  61: tokio::runtime::blocking::pool::Inner::run
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/blocking/pool.rs:513:17
  62: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/blocking/pool.rs:471:13

To Reproduce

  1. Cleanup ~/.risingwave.
  2. Start single-node with risingwave.
  3. Create a table with create table t;
  4. Send concurrent ALTER requests with:
seq 1 10000 | parallel -j 8 "psql -h 0.0.0.0 -p 4566 -d dev -U root -c \"alter table t add column v{} int\""

Expected behavior

May postpone or reject some requests, but no component should crash and the table must be remain in a consistent state.

How did you deploy RisingWave?

No response

The version of RisingWave

main

Additional context

With etcd backend, the cluster survives while rejecting some of the requests. This is achieved by the check here:

// TODO: Here we reuse the `creation` tracker for `alter` procedure, as an `alter` must
// occur after it's created. We may need to add a new tracker for `alter` procedure.
if database_core.has_in_progress_creation(&key) {
bail!("table is in altering procedure");

@BugenZhao BugenZhao added type/bug Something isn't working component/meta Meta related issue. labels Jun 4, 2024
@github-actions github-actions bot added this to the release-1.10 milestone Jun 4, 2024
@yezizp2012 yezizp2012 self-assigned this Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/meta Meta related issue. type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants