-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: consider backward compatibility #41
Comments
That's a good question, and I've thought a lot about compatibility, so let's take it case by case. SQL syntaxAfter our first stable release, it's better to always keep SQL compatibility. Generally speaking, this won't be too hard, since most of our SQL is just following how Postgres behaves. We can mark a stabilized syntax as deprecated, but never have to remove it. I guess it doesn't take much effort. For very rare design flaws, used by few users, break changes are allowed. We should discuss them case by case. Example: risingwavelabs/risingwave#4503 doesn't break anything, and we may still find the better property naming convention in the future. Data format in storageSST formatAll objects stored in our storage backend are in our private SST format. There is an Row EncodingThe SST data blocks are organized as rows of key/value pairs. We use memcomparable encoding for the key and value encoding for the value. There's a lot of design space, and we've even made several break changes now. After we stabilize, we need to ensure that any old-version key/value pairs can be decoded correctly. Fortunately, we can ensure that all key/value pairs in the same SST file have the same encoding version, so we don't need to store the version per row. IMHO, a good practice to maintain encoding compatibility is to use separate crates and versions for the row encodings. Based on the power of cargo, we can use different versions of the same crate in one project. For example: memcomparable_0_1 = { package = "memcomparable", version = "0.1.2" }
memcomparable_0_2 = { package = "memcomparable", version = "0.2.3" }
memcomparable_1 = { package = "memcomparable", version = "1.4" }
value_encoding_0_1 = { package = "value_encoding", version = "0.1" }
value_encoding_1 = { package = "value_encoding", version = "1.2" }
value_encoding_2 = { package = "value_encoding", version = "2.1" } And dispatch the implementation statically while decoding the SSTs. Pseudocode: fn decode<const SST_VERSION: usize>(buf: &Bytes) -> (Key, Value) {
match SST_VERSION {
1 => (memcomparable_0_1.decode(buf), value_encoding_0_1.decode(buf))
2 => (memcomparable_0_2.decode(buf), value_encoding_1.decode(buf))
3 => (memcomparable_0_2.decode(buf), value_encoding_2.decode(buf))
4 => (memcomparable_1.decode(buf), value_encoding_2.decode(buf)),
_ => unreachable!("corrupted data")
}
} The encoding crate version should follow the semver rules, so that we can release some patches (e.g. bump deps) to the older versions. Upgrade During CompactingWhen a new storage version (both SST and row encoding) is released, no data is changed until the SST is compacted. Compacting will rebuild the SST file so that we can upgrade the version in the meantime. Compaction always reconstructs an entire SST file, so we can ensure that all key/value pairs in the same SST file have the same encoding version. Data Format in MetaData in the meta is organized as a tree structure, and currently the major meta backend is etcd. The meta values are encoded by protobuf encoding. If a change only adds/reduces fields in the meta, then the protobuf can ensure that nothing is broken. If a change must reorganize the tree structure itself, then a break change must be made. Since the meta-store data is never too large, a migration script is preferred for such a break change. If we implement a SQL-based metastore later, we can use some SQL like Protocols between nodesIn general, the protobuf can handle the situation well enough. To make things easier, we can add a version field to each request, so that an older compute node can always reject some DDL operations from a newer frontend. For CN-CN communication (mostly exchange), protobuf compatibility should be handled carefully by developers. CLI args/config fileNo ideas, we should create a good enough configuration framework before the first release. |
In production, there will always be cases in which we upgrade a RisingWave cluster. The problem is that upgrading isn't an atomic process, meaning multiple node versions can exist. One simple solution is to fully stop the cluster and start the newer version afterward, but the connections from our user side will be disconnected. Apparently, it hurts user experience and thus shouldn't be the first choice in most cases. Compatibility of the communication protocols should be carefully kept and tested for those promised compatible versions, i.e., versions with the same major (and minor) numbers in semantic versions. For the CLI args and config parameters, I'd like to propose that the unrecognized parameters should trigger warnings instead of panics. This makes the cloud easier to manage the config templates. Also, critical args should be carefully handled because the operator can not distinguish from versions. If the operator also applies the arg changes (e.g., supports a new argument or deprecates an old one), the older RisingWave versions won't be deployable anymore. That has already happened several times. |
What about testing. Is it possible to setup some kind of compatibility test in CI? 👀 |
I'd prefer always rolling-update the whole cluster, with specified orders. Two versions of nodes should only co-exist in several seconds to minutes, so we can reject some older requests directly. |
Not ready, we haven't implemented all mechanisms, just some thoughts. |
I agree, but it could take a long time to upgrade from a really old version. |
This comment was marked as resolved.
This comment was marked as resolved.
What @TennyZhuang mentioned is absolutely essential that we should ensure the extensibility of our major components. But the reality is, even some trivial configuration changes could break the compatibility: risingwavelabs/risingwave#7527. Without actually testing out an upgrade, we'll never know what could break. @arkbriar @sumittal Let's include upgrade testing in our release process. |
@neverchanje I've commented under risingwavelabs/risingwave#7530 (comment) and proposed to keep the backward compatibility for at least one patch version so that the change won't block any others at the exact time it is made to make others life easier. |
Of course we can add an upgrade test during the release process, but what if the test fails? Whether to roll back the kernel or upgrade the cloud, such break changes will cause a lot of trouble, and even affect customers, because the cloud operator is public in eks, and not all problems can be solved by upgrading the operator. We should formulate more detailed compatibility rules. |
This comment was marked as resolved.
This comment was marked as resolved.
@huangjw806 We shouldn't release any stable versions with upgradability issues, whether or not it's a patch release or a major release. If the migration tools are not ready, we should revert the changes that cause the issues. |
This comment was marked as resolved.
This comment was marked as resolved.
draft a RFC to discuss how to maintain and manage the Backward Compatibility of the stream plan and SQL features. need more discussion #43 |
#43 also contains valuable discussions about the requirements for rolling upgrade. |
previous
breaking-change
https://github.com/risingwavelabs/risingwave/issues?q=label%3Abreaking-changeThere are many aspects to consider for compatibility
However
So I just create an issue first to collect ideas :)
The text was updated successfully, but these errors were encountered: