-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: add data compatibility test #3109
test: add data compatibility test #3109
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3109 +/- ##
==========================================
- Coverage 85.65% 84.80% -0.86%
==========================================
Files 859 880 +21
Lines 140757 143115 +2358
==========================================
+ Hits 120571 121374 +803
- Misses 20186 21741 +1555 |
I'm not a big fan of checking-in blobs. Can we use a prebuilt docker image for that? Checking-in blobs will increase the repo size and make clone and fetch slow. We will be running these tests for every future version, which means more blobs to be added to repo. I'm afraid this approach won't be scalable. |
@sunng87 Those binary files are not large for now, only 2.8MB compared to total repo size 76MB. We can keep this size small. Besides, there always should be only one piece of data that are tested against the newly built greptimedb. Should there be any breaking changes, the data need to be generated again (using current codes). Finally, a pre-built image for this simple purpose is kind of overkill. To make it worse, the downloading and running of the image will increase the github CI time a lot. So I think at this early stage, let's keep it simple and straightforward. If there're more complex compatibility testing cases, we consider the other way. |
I'm negative about this either. Those files are likely to change from version to version, I prefer to track those fixtures elsewhere, like a pre-build image or a separate repo like https://github.com/apache/parquet-testing, where
|
I think running compatibility test in github CI is a must, it tells us which PR has breaking change immediately. Another repo or pre-built image are overkill for this, complicate things a lot. Not to metion the time cost. I'm planning refer to how databend does the compatibility test (see https://github.com/datafuselabs/databend/tree/main/tests/fuse-compat), that:
|
Pull request was converted to draft
cd9b9f7
to
2740c6b
Compare
This looks good to me. Generating data on testing can make this test more flexible, by this way we have the ability to check the compatibility between two arbitrary versions. |
6d22d86
to
ffec25f
Compare
Sounds ok to me. We can start from this approach and see if it scales |
847400b
to
1668cf6
Compare
3f490be
to
8099727
Compare
966147a
to
156e7bc
Compare
156e7bc
to
329df4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other parts have been looking good to me
@waynexia PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM
Co-authored-by: Ruihang Xia <[email protected]>
I hereby agree to the terms of the GreptimeDB CLA
What's changed and what's your intention?
Add a test for data files compatibility. See the readme for more details.
Checklist
Refer to a related PR or issue link (optional)
#1730