Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent downgrading RisingWave version for a running cluster by default #15761

Open
hzxa21 opened this issue Mar 18, 2024 · 2 comments
Open

Prevent downgrading RisingWave version for a running cluster by default #15761

hzxa21 opened this issue Mar 18, 2024 · 2 comments

Comments

@hzxa21
Copy link
Collaborator

hzxa21 commented Mar 18, 2024

Is your feature request related to a problem? Please describe.

Given that forward compatibility is not guaranteed, it is dangerous to downgrade RisingWave version for a running cluster because it may corrupt data/metadata, leading to undefined behavior. Even worse, if the downgrade happens successfully (for example, all nodes are up without crash after the downgrade), there may be corruption happening slightly and we won't notice it until the corrupted data is touched.

Therefore, I think it is better to prevent downgrade from happening by default. We can provide an option to circumvent it if needed.

Describe the solution you'd like

  • Meta persists the image version for each worker node (including itself) in meta store.
  • All other nodes propagate the image version to meta on joining the cluster.
  • Meta upgrades the version of the worker node to its propagated version if it is larger. Otherwise prevent the node from joining the cluster by default.

In addition to the benefit of preventing unexpected cluster downgrade, with the image version of each worker node persisted, we can have more controls and flexibilities on cluster upgrade:

  • We can reject new worker nodes running older codes to join the cluster.
  • We can coordinate cluster upgrade when needed. For example, storage layout upgrades like Support bucketed prefix for object in non-S3 object store backend #15667, requiring source/compaction is paused until all nodes are converged to a newer version, can be coordinated.
  • It may also be possible to automatically trigger a meta-backup before cluster upgrade to ensure rollback is possible when surprises happen after the upgrade.

Describe alternatives you've considered

No response

Additional context

No response

@github-actions github-actions bot added this to the release-1.8 milestone Mar 18, 2024
@kwannoel
Copy link
Contributor

Prefer this:

It may also be possible to automatically trigger a meta-backup before cluster upgrade to ensure rollback is possible when surprises happen after the upgrade.

And also I don't think we should block minor version downgrade, e.g. 1.7.1 -> 1.7.0.

@fuyufjh fuyufjh removed this from the release-1.8 milestone Apr 8, 2024
Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants