From b8ecade84a6adad7ceca5fa6e6ab6df28670e2fa Mon Sep 17 00:00:00 2001 From: st1page <1245835950@qq.com> Date: Tue, 31 Jan 2023 18:47:14 +0800 Subject: [PATCH 1/4] add rfc --- ...0-backward-compatibility-of-stream-plan.md | 39 +++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 rfcs/0050-backward-compatibility-of-stream-plan.md diff --git a/rfcs/0050-backward-compatibility-of-stream-plan.md b/rfcs/0050-backward-compatibility-of-stream-plan.md new file mode 100644 index 00000000..5ebcbe53 --- /dev/null +++ b/rfcs/0050-backward-compatibility-of-stream-plan.md @@ -0,0 +1,39 @@ +--- +feature: Backward Compatibility of Stream Plan +authors: + - "st1page" +start_date: "2023/1/31" +--- + +# Backward Compatibility of Stream Plan + +## Summary + +- distinguish the nightly and stable for SQL features and stream plan node protobuf. +- use a Copy-on-Write style on changing the stable stream plan node protobuf. + +## Motivation + +In https://github.com/risingwavelabs/rfcs/issues/41, we discuss the backward compatibility. And the protobuf structure of stream plan nodes is a special part. +- the plan node's structure usually modified more frequently than other protobuf structure such as catalog, especially when we are developing new SQL features and we even do not know how to do it right. The plan node's changes are not only adding some optional field(which can be solved by protobuf) but also of meaning and behaviors of the operator. For example, our state table information of streamAgg having breaking changed in 0.1.13 and in 0.1.16, the source executor is no longer responsible for generating row_id. And we do not confirm the sort and overAgg's format so far. +- in other databases, the plan node is just used as a communicating protocol between frontend and compute node. So the compute node can only support the latest version's plan node format and reject all the requests with unknown plan node. But our stream plan should be persistent in meta store which means that a compute node must be compatible with all versions of old plans' protobuf format. + +In conclusion, we need find a way to achieve a balance between rapid development and backward compatibility, especially for stream plan node. + +## Design + +### Nightly and Stable SQL Features +Distinguish the nightly and stable feature when publishing release version. RW will do not ensure compatibility for the streaming jobs with the nightly features in following releases. For example, if we release the "emit on close" as a in the release v0.1.17 and user create a mv with that feature on a v0.1.17 cluster. The v0.1.18 and following version's RW can not ensure it can run successfully on the existing streaming jobs. User can drop the MVs with the nightly feature before they upgrade the cluster. For those nightly feature users really what to upgrade, we can write helper scripts too. And the stable features will be tested with new released compute node on old version streaming plans. Also, with the convinced stable feature list, we can test the backward compatibility more easily. + +### Nightly and Stable Stream Plan Node +How to know if a SQL Feature has been stable? Developer should comment the compatibility annotation on protobuf struct of every stream plan node(like annotation in java). the annotation contains: "nightly v0.1.14", "stable v0.1.15", "deprecated v0.1.16". A plan node will be with a nightly annotation firstly. When developer ensure that the plan node struct is stable enough, a stable annotation should be comments on the protobuf struct. When developer ensure that frontend will not generate the plan node, a deprecated annotation should be comments on the protobuf struct. A SQL feature is stable means that all the stream plan nodes generated by any version's optimizer should have been stable. + +To be discussed: what is the proper format of those comments in proto files and how to check all plan node should have one in CI check? + +### Copy-on-Write Style Changes on Stable Plan Node Protobuf +How to maintain the compatibility of the plan node's protobuf? If developer want to do any changes on a stable plan node, he should add a new plan node protobuf definition. For example, if he want to add a new field in `StreamHashAgg`, he must define a new protobuf struct `StreamHashAggV2` and add the field on that. Notice that there are multi versions protobuf but they can share the same implementation. + +Why make it so complicated and why not just rely on the protobuf's compatibility? To achieve the compatibility, protobuf actually give the struct that all fields are optional. When a protobuf struct is used as a RPC interface, the caller will give a combination of those optional fields and the callee should try best to try all kinds of meaningful combinations or return an error. based on the following facts I think the Copy-on-Write Style Changes is better. +- the changes of the stable plan node is limited. we can make breaking changes in the same release arbitrarily and a stable plan node will not be modified too much. So the duplicated plan node definition will not be too much. +- here the “return error” is unacceptable for us because if we can not resolve the stored streaming plan, the cluster can not boot up anyway. So we must make sure that the compute node can accept any combination of the fields in historical versions. Store all these combination in different version's plannode definition will help to maintain the compatibility, or it will just exist in the compute node's code and easily be forgotten. + From 548cc7243030ae285b8874e77c4f92e685baec8e Mon Sep 17 00:00:00 2001 From: st1page <1245835950@qq.com> Date: Tue, 31 Jan 2023 18:48:14 +0800 Subject: [PATCH 2/4] rename pr id --- ...ream-plan.md => 0043-backward-compatibility-of-stream-plan.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename rfcs/{0050-backward-compatibility-of-stream-plan.md => 0043-backward-compatibility-of-stream-plan.md} (100%) diff --git a/rfcs/0050-backward-compatibility-of-stream-plan.md b/rfcs/0043-backward-compatibility-of-stream-plan.md similarity index 100% rename from rfcs/0050-backward-compatibility-of-stream-plan.md rename to rfcs/0043-backward-compatibility-of-stream-plan.md From cc5dbd4fcdf9c3463937f07ca6fdcdded732b24e Mon Sep 17 00:00:00 2001 From: st1page <1245835950@qq.com> Date: Tue, 31 Jan 2023 19:30:55 +0800 Subject: [PATCH 3/4] fix typo --- rfcs/0043-backward-compatibility-of-stream-plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/0043-backward-compatibility-of-stream-plan.md b/rfcs/0043-backward-compatibility-of-stream-plan.md index 5ebcbe53..7d5a3b61 100644 --- a/rfcs/0043-backward-compatibility-of-stream-plan.md +++ b/rfcs/0043-backward-compatibility-of-stream-plan.md @@ -23,7 +23,7 @@ In conclusion, we need find a way to achieve a balance between rapid development ## Design ### Nightly and Stable SQL Features -Distinguish the nightly and stable feature when publishing release version. RW will do not ensure compatibility for the streaming jobs with the nightly features in following releases. For example, if we release the "emit on close" as a in the release v0.1.17 and user create a mv with that feature on a v0.1.17 cluster. The v0.1.18 and following version's RW can not ensure it can run successfully on the existing streaming jobs. User can drop the MVs with the nightly feature before they upgrade the cluster. For those nightly feature users really what to upgrade, we can write helper scripts too. And the stable features will be tested with new released compute node on old version streaming plans. Also, with the convinced stable feature list, we can test the backward compatibility more easily. +Distinguish the nightly and stable feature when publishing release version. RW will do not ensure compatibility for the streaming jobs with the nightly features in following releases. For example, if we release the "emit on close" as a nightly feature in the release v0.1.17 and user create a mv with that feature on a v0.1.17 cluster. The v0.1.18 and following version's RW can not ensure it can run successfully on the existing streaming jobs. User can drop the MVs with the nightly feature before they upgrade the cluster. For those nightly feature users really what to upgrade, we can write helper scripts too. And the stable features will be tested with new released compute node on old version streaming plans. Also, with the convinced stable feature list, we can test the backward compatibility more easily. ### Nightly and Stable Stream Plan Node How to know if a SQL Feature has been stable? Developer should comment the compatibility annotation on protobuf struct of every stream plan node(like annotation in java). the annotation contains: "nightly v0.1.14", "stable v0.1.15", "deprecated v0.1.16". A plan node will be with a nightly annotation firstly. When developer ensure that the plan node struct is stable enough, a stable annotation should be comments on the protobuf struct. When developer ensure that frontend will not generate the plan node, a deprecated annotation should be comments on the protobuf struct. A SQL feature is stable means that all the stream plan nodes generated by any version's optimizer should have been stable. From b0bc692e8f06ae1426d12223a00a1db6ec9068e6 Mon Sep 17 00:00:00 2001 From: xxchan Date: Tue, 7 Feb 2023 13:42:23 +0100 Subject: [PATCH 4/4] Update 0043-backward-compatibility-of-stream-plan.md (#47) --- ...3-backward-compatibility-of-stream-plan.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/rfcs/0043-backward-compatibility-of-stream-plan.md b/rfcs/0043-backward-compatibility-of-stream-plan.md index 7d5a3b61..9c44e2e5 100644 --- a/rfcs/0043-backward-compatibility-of-stream-plan.md +++ b/rfcs/0043-backward-compatibility-of-stream-plan.md @@ -9,31 +9,31 @@ start_date: "2023/1/31" ## Summary -- distinguish the nightly and stable for SQL features and stream plan node protobuf. -- use a Copy-on-Write style on changing the stable stream plan node protobuf. +- Distinguish the nightly and stable for SQL features and stream plan node protobuf. +- Use a Copy-on-Write style on changing the stable stream plan node protobuf. ## Motivation -In https://github.com/risingwavelabs/rfcs/issues/41, we discuss the backward compatibility. And the protobuf structure of stream plan nodes is a special part. -- the plan node's structure usually modified more frequently than other protobuf structure such as catalog, especially when we are developing new SQL features and we even do not know how to do it right. The plan node's changes are not only adding some optional field(which can be solved by protobuf) but also of meaning and behaviors of the operator. For example, our state table information of streamAgg having breaking changed in 0.1.13 and in 0.1.16, the source executor is no longer responsible for generating row_id. And we do not confirm the sort and overAgg's format so far. -- in other databases, the plan node is just used as a communicating protocol between frontend and compute node. So the compute node can only support the latest version's plan node format and reject all the requests with unknown plan node. But our stream plan should be persistent in meta store which means that a compute node must be compatible with all versions of old plans' protobuf format. +In https://github.com/risingwavelabs/rfcs/issues/41, we discussed the backward compatibility. The protobuf structure of stream plan nodes is a special part. +- The plan node's structure is usually modified more frequently than other protobuf structures, such as the catalog, especially when new SQL features are being developed. The changes to the plan node are not only adding optional fields (which can be solved by protobuf), but also its meaning and behaviors of the operator. For example, the state table information of streamAgg underwent breaking changes in 0.1.13 and in 0.1.16, the source executor is no longer responsible for generating row_id. The format of sort and overAgg has not been confirmed so far. +- In other databases, the plan node is just used as a communication protocol between the frontend and compute node, so the compute node can only support the latest version of the plan node format and reject all requests with an unknown plan node. But our stream plan must be persistent in the meta store, meaning that a compute node must be compatible with all versions of the old plan's protobuf format. -In conclusion, we need find a way to achieve a balance between rapid development and backward compatibility, especially for stream plan node. +In conclusion, we need to find a way to balance rapid development and backward compatibility, especially for the stream plan node. ## Design ### Nightly and Stable SQL Features -Distinguish the nightly and stable feature when publishing release version. RW will do not ensure compatibility for the streaming jobs with the nightly features in following releases. For example, if we release the "emit on close" as a nightly feature in the release v0.1.17 and user create a mv with that feature on a v0.1.17 cluster. The v0.1.18 and following version's RW can not ensure it can run successfully on the existing streaming jobs. User can drop the MVs with the nightly feature before they upgrade the cluster. For those nightly feature users really what to upgrade, we can write helper scripts too. And the stable features will be tested with new released compute node on old version streaming plans. Also, with the convinced stable feature list, we can test the backward compatibility more easily. +Distinguish the nightly and stable features when releasing versions. RW will not ensure compatibility for streaming jobs with the nightly features in subsequent releases. For example, if the "emit on close" feature is released as a nightly feature in release v0.1.17 and a user creates an MV with that feature on a v0.1.17 cluster, RW in v0.1.18 and subsequent versions cannot ensure that the existing streaming job will run successfully. Users can drop the MVs with the nightly feature before upgrading the cluster. For those who really want to upgrade the nightly feature, we can write helper scripts. The stable features will be tested with new released compute nodes on old version streaming plans. Also, with the list of stable features, we can more easily test backward compatibility. ### Nightly and Stable Stream Plan Node -How to know if a SQL Feature has been stable? Developer should comment the compatibility annotation on protobuf struct of every stream plan node(like annotation in java). the annotation contains: "nightly v0.1.14", "stable v0.1.15", "deprecated v0.1.16". A plan node will be with a nightly annotation firstly. When developer ensure that the plan node struct is stable enough, a stable annotation should be comments on the protobuf struct. When developer ensure that frontend will not generate the plan node, a deprecated annotation should be comments on the protobuf struct. A SQL feature is stable means that all the stream plan nodes generated by any version's optimizer should have been stable. +How to determine if a SQL Feature is stable? Developers should comment the compatibility annotation on the protobuf struct of each stream plan node (similar to annotations in Java). The annotation contains: "nightly v0.1.14", "stable v0.1.15", "deprecated v0.1.16". A plan node will initially have a nightly annotation. When the developer ensures that the plan node structure is stable, a stable annotation should be added to the protobuf struct. When the developer ensures that the frontend will not generate the plan node, a deprecated annotation should be added to the protobuf struct. A SQL feature is considered stable if all stream plan nodes generated by any version of the optimizer are stable. -To be discussed: what is the proper format of those comments in proto files and how to check all plan node should have one in CI check? +To be discussed: What is the proper format of these comments in the proto files and how to check that all plan nodes have one in CI checks? ### Copy-on-Write Style Changes on Stable Plan Node Protobuf -How to maintain the compatibility of the plan node's protobuf? If developer want to do any changes on a stable plan node, he should add a new plan node protobuf definition. For example, if he want to add a new field in `StreamHashAgg`, he must define a new protobuf struct `StreamHashAggV2` and add the field on that. Notice that there are multi versions protobuf but they can share the same implementation. +How to maintain compatibility of the plan node's protobuf? If a developer wants to make changes to a stable plan node, he should add a new plan node protobuf definition. For example, if they want to add a new field in `StreamHashAgg`, they must define a new protobuf struct `StreamHashAggV2` and add the field to it. Note that there can be multiple versions of protobuf, but they can share the same implementation. -Why make it so complicated and why not just rely on the protobuf's compatibility? To achieve the compatibility, protobuf actually give the struct that all fields are optional. When a protobuf struct is used as a RPC interface, the caller will give a combination of those optional fields and the callee should try best to try all kinds of meaningful combinations or return an error. based on the following facts I think the Copy-on-Write Style Changes is better. -- the changes of the stable plan node is limited. we can make breaking changes in the same release arbitrarily and a stable plan node will not be modified too much. So the duplicated plan node definition will not be too much. -- here the “return error” is unacceptable for us because if we can not resolve the stored streaming plan, the cluster can not boot up anyway. So we must make sure that the compute node can accept any combination of the fields in historical versions. Store all these combination in different version's plannode definition will help to maintain the compatibility, or it will just exist in the compute node's code and easily be forgotten. +Why is it so complicated and why not just rely on protobuf's compatibility? To achieve compatibility, protobuf actually makes the struct such that all fields are optional. When a protobuf struct is used as a RPC interface, the caller provides a combination of optional fields, and the callee should try their best to handle all meaningful combinations or return an error. Based on the following facts, I believe the Copy-on-Write Style Changes is better. +- Changes to stable plan nodes are limited. We can make breaking changes within the same release, and a stable plan node will not be modified too often. As a result, the number of duplicated plan node definitions will not be excessive. +- Here, "returning an error" is unacceptable because if we cannot resolve the stored streaming plan, the cluster cannot boot up. We must ensure that the compute node can accept any combination of fields from historical versions. Storing all these combinations in different version's plan node definitions helps maintain compatibility, or it will simply exist in the compute node's code and be easily forgotten.