Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: panic when create sink into an APPEND ONLY table without pk with watermark #16836

Closed
st1page opened this issue May 20, 2024 · 5 comments · Fixed by #17724
Closed

bug: panic when create sink into an APPEND ONLY table without pk with watermark #16836

st1page opened this issue May 20, 2024 · 5 comments · Fixed by #17724
Assignees
Milestone

Comments

@st1page
Copy link
Contributor

st1page commented May 20, 2024

CREATE TABLE exam_scores (
  score_id int,
  exam_id int,
  student_id int,
  score real,
  exam_date timestamptz
);

CREATE TABLE t (
    index int ,
    time timestamptz,
    WATERMARK FOR time AS time - INTERVAL '5' SECOND
) APPEND ONLY;

CREATE SINK sink1 INTO t AS  select student_id, exam_date as time from exam_scores with (type = 'append-only', force_append_only = 'true');

compute node log:

thread 'rw-streaming' panicked at src/storage/src/hummock/store/hummock_storage.rs:392:21:
There are 2 read version associated with vnode 64. read_version_vnodes=[[64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127], [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]]
stack backtrace:
thread 'rw-streaming' panicked at src/storage/src/hummock/store/hummock_storage.rs:392:21:
There are 2 read version associated with vnode 0. read_version_vnodes=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]]
thread 'rw-streaming' panicked at src/storage/src/hummock/store/hummock_storage.rs:392:21:
There are 2 read version associated with vnode 0. read_version_vnodes=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]]
thread 'rw-streaming' panicked at src/storage/src/hummock/store/hummock_storage.rs:392:21:
There are 2 read version associated with vnode 64. read_version_vnodes=[[64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127], [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]]
@github-actions github-actions bot added this to the release-1.10 milestone May 20, 2024
@st1page
Copy link
Contributor Author

st1page commented May 21, 2024

it seems that all the replace table issue can trigger this problem

alter table t add column a int;

@Li0k
Copy link
Contributor

Li0k commented May 21, 2024

discuss offline with @ st1page, The read_version involved in batch read has not been destroyed yet, but the new read_version has been registered. (The disorder of register and destroy seems to be the cause of the problem)

@st1page
Copy link
Contributor Author

st1page commented May 21, 2024

discuss offline with @ st1page, The read_version involved in batch read has not been destroyed yet, but the new read_version has been registered. (The disorder of register and destroy seems to be the cause of the problem)

I have checked that the issue does not happen when the table fragment's parallelism is 1. It is because some actors received the configure change barrier before others.

@st1page
Copy link
Contributor Author

st1page commented May 21, 2024

remove the assert and see the log. The read version registry and new actor's read are before the destroy

                if read_version_vec.len() != 1 {
                    let read_version_vnodes = vec![0];
                    // read_version_vec
                    //     .into_iter()
                    //     .map(|v| {
                    //         let v = v.read();
                    //         v.vnodes().iter_ones().collect_vec()
                    //     })
                    //     .collect_vec();
                    tracing::warn!("There are {} read version associated with vnode {}. read_version_vnodes={:?}", read_version_vnodes.len(), vnode.to_index(), read_version_vnodes);
                }
dev=> set streaming_parallelism=2;
SET_VARIABLE
dev=> CREATE TABLE t (
    index int ,
    time timestamptz,
    WATERMARK FOR time AS time - INTERVAL '5' SECOND
) APPEND ONLY;
NOTICE:  APPEND ONLY TABLE is currently an experimental feature.
CREATE_TABLE
dev=> alter table t add column a int;
ALTER_TABLE
2024-05-21T20:57:04.257136254+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: update to hummock version: 23, epoch: 6492063219580928
2024-05-21T20:57:04.376908418+08:00 DEBUG risingwave_stream::from_proto::row_id_gen: row id gen executor: Some(1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
2024-05-21T20:57:04.37788187+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: new read version registered: table_id: 2, instance_id: 5
2024-05-21T20:57:04.378831597+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: new read version registered: table_id: 1, instance_id: 6
2024-05-21T20:57:04.379647063+08:00 DEBUG risingwave_stream::from_proto::row_id_gen: row id gen executor: Some(0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111)
2024-05-21T20:57:04.380229212+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: new read version registered: table_id: 2, instance_id: 7
2024-05-21T20:57:04.380680577+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: new read version registered: table_id: 1, instance_id: 8
2024-05-21T20:57:04.385137196+08:00  INFO actor{otel.name="Actor 4" actor_id=4 prev_epoch=6492063219580928 curr_epoch=6492063285182464}:executor{otel.name="Dml 400000001"}: risingwave_stream::executor::stream_reader: data stream paused
2024-05-21T20:57:04.385142616+08:00  INFO actor{otel.name="Actor 3" actor_id=3 prev_epoch=6492063219580928 curr_epoch=6492063285182464}:executor{otel.name="Dml 300000001"}: risingwave_stream::executor::stream_reader: data stream paused
2024-05-21T20:57:04.386029616+08:00  WARN actor{otel.name="Actor 2" actor_id=2 prev_epoch=6492063219580928 curr_epoch=6492063285182464}:executor{otel.name="Materialize 200000006"}:executor{otel.name="WatermarkFilter 200000005"}: risingwave_storage::hummock::store::hummock_storage: There are 1 read version associated with vnode 0. read_version_vnodes=[0]
...omit.....
2024-05-21T20:57:04.421987146+08:00  WARN actor{otel.name="Actor 2" actor_id=2 prev_epoch=6492063285182464 curr_epoch=6492063293964288}:executor{otel.name="Materialize 200000006"}:executor{otel.name="WatermarkFilter 200000005"}: risingwave_storage::hummock::store::hummock_storage: There are 1 read version associated with vnode 105. read_version_vnodes=[0]
2024-05-21T20:57:04.421988678+08:00  WARN actor{otel.name="Actor 5" actor_id=5 prev_epoch=6492063293964288 curr_epoch=6492063295406080}:executor{otel.name="Materialize 500000006"}:executor{otel.name="WatermarkFilter 500000005"}: risingwave_storage::hummock::store::hummock_storage: There are 1 read version associated with vnode 245. read_version_vnodes=[0]
....omit....
2024-05-21T20:57:04.422084924+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: read version deregister: table_id: 2, instance_id: 1
2024-05-21T20:57:04.422120099+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: read version deregister: table_id: 1, instance_id: 2
...omit...
2024-05-21T20:57:04.425850665+08:00  WARN actor{otel.name="Actor 5" actor_id=5 prev_epoch=6492063293964288 curr_epoch=6492063295406080}:executor{otel.name="Materialize 500000006"}:executor{otel.name="WatermarkFilter 500000005"}: risingwave_storage::hummock::store::hummock_storage: There are 1 read version associated with vnode 129. read_version_vnodes=[0]
2024-05-21T20:57:04.425925485+08:00  WARN actor{otel.name="Actor 5" actor_id=5 prev_epoch=6492063293964288 curr_epoch=6492063295406080}:executor{otel.name="Materialize 500000006"}:executor{otel.name="WatermarkFilter 500000005"}: risingwave_storage::hummock::store::hummock_storage: There are 1 read version associated with vnode 130. read_version_vnodes=[0]
2024-05-21T20:57:04.426020624+08:00 DEBUG risingwave_storage::hummock::event_handler::uploader: epoch 6492063293964288 is sealed
2024-05-21T20:57:04.426031881+08:00  WARN actor{otel.name="Actor 5" actor_id=5 prev_epoch=6492063293964288 curr_epoch=6492063295406080}:executor{otel.name="Materialize 500000006"}:executor{otel.name="WatermarkFilter 500000005"}: risingwave_storage::hummock::store::hummock_storage: There are 1 read version associated with vnode 131. read_version_vnodes=[0]
2024-05-21T20:57:04.426095774+08:00 DEBUG risingwave_storage::hummock::event_handler::uploader: epoch 6492063293964288 to seal has no data
2024-05-21T20:57:04.426155957+08:00 DEBUG risingwave_storage::hummock::event_handler::uploader: start sync epoch: 6492063293964288
2024-05-21T20:57:04.426226609+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: epoch has been synced: 6492063293964288.
2024-05-21T20:57:04.426269015+08:00  WARN actor{otel.name="Actor 5" actor_id=5 prev_epoch=6492063293964288 curr_epoch=6492063295406080}:executor{otel.name="Materialize 500000006"}:executor{otel.name="WatermarkFilter 500000005"}: risingwave_storage::hummock::store::hummock_storage: There are 1 read version associated with vnode 132. read_version_vnodes=[0]
2024-05-21T20:57:04.426295829+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: read version deregister: table_id: 2, instance_id: 3
2024-05-21T20:57:04.426343127+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: read version deregister: table_id: 1, instance_id: 4
2024-05-21T20:57:04.426362333+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: receive await sync epoch: 6492063293964288
2024-05-21T20:57:04.426374837+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: epoch 6492063293964288 has been synced. Current max_sync_epoch 6492063293964288
2024-05-21T20:57:04.428904189+08:00 DEBUG risingwave_storage::hummock::event_handler::hummock_event_handler: update to hummock version: 25, epoch: 6492063293964288
2024-05-21T20:57:04.429514482+08:00 DEBUG risingwave_stream::task::stream_manager: drop actors actors=[1, 2, 3, 4]

@st1page
Copy link
Contributor Author

st1page commented May 21, 2024

The case is a little special because

  • the watermarkFilter will take a read immediately after being created
  • replace table is just recreate the actors with the same vnode mapping in the same compute node
  • This assert is just in the StorageTables' read path(is it correct?)
    @shanicky can you come up with some tricky scale command to trigger this bug in other cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants