Prevent downgrading RisingWave version for a running cluster by default #15761

hzxa21 · 2024-03-18T17:44:30Z

Is your feature request related to a problem? Please describe.

Given that forward compatibility is not guaranteed, it is dangerous to downgrade RisingWave version for a running cluster because it may corrupt data/metadata, leading to undefined behavior. Even worse, if the downgrade happens successfully (for example, all nodes are up without crash after the downgrade), there may be corruption happening slightly and we won't notice it until the corrupted data is touched.

Therefore, I think it is better to prevent downgrade from happening by default. We can provide an option to circumvent it if needed.

Describe the solution you'd like

Meta persists the image version for each worker node (including itself) in meta store.
All other nodes propagate the image version to meta on joining the cluster.
Meta upgrades the version of the worker node to its propagated version if it is larger. Otherwise prevent the node from joining the cluster by default.

In addition to the benefit of preventing unexpected cluster downgrade, with the image version of each worker node persisted, we can have more controls and flexibilities on cluster upgrade:

We can reject new worker nodes running older codes to join the cluster.
We can coordinate cluster upgrade when needed. For example, storage layout upgrades like Support bucketed prefix for object in non-S3 object store backend #15667, requiring source/compaction is paused until all nodes are converged to a newer version, can be coordinated.
It may also be possible to automatically trigger a meta-backup before cluster upgrade to ensure rollback is possible when surprises happen after the upgrade.

Describe alternatives you've considered

No response

Additional context

No response

kwannoel · 2024-03-19T06:35:51Z

Prefer this:

It may also be possible to automatically trigger a meta-backup before cluster upgrade to ensure rollback is possible when surprises happen after the upgrade.

And also I don't think we should block minor version downgrade, e.g. 1.7.1 -> 1.7.0.

github-actions · 2024-06-12T08:56:48Z

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

hzxa21 added the type/feature label Mar 18, 2024

github-actions bot added this to the release-1.8 milestone Mar 18, 2024

fuyufjh removed this from the release-1.8 milestone Apr 8, 2024

github-actions bot added the no-issue-activity label Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent downgrading RisingWave version for a running cluster by default #15761

Prevent downgrading RisingWave version for a running cluster by default #15761

hzxa21 commented Mar 18, 2024

kwannoel commented Mar 19, 2024

github-actions bot commented Jun 12, 2024

Prevent downgrading RisingWave version for a running cluster by default #15761

Prevent downgrading RisingWave version for a running cluster by default #15761

Comments

hzxa21 commented Mar 18, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

kwannoel commented Mar 19, 2024

github-actions bot commented Jun 12, 2024