-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sr/avro: fix missing default compat check for null #23839
Conversation
bazel/repositories.bzl
Outdated
sha256 = "f1a7d13b28ce5cc8812f26c705a6ea27b8bc63554d82d556c63b437da0338cf1", | ||
strip_prefix = "avro-e54bf712fce903652f3eab7a6c16264ac5d17285", | ||
url = "https://github.com/redpanda-data/avro/archive/e54bf712fce903652f3eab7a6c16264ac5d17285.tar.gz", | ||
sha256 = "3f6f1e32225b94f3bd70822b7e819c61b30570417308ae7a9d7661d8016be99e", | ||
strip_prefix = "avro-98d753c48cf3a22b52adc688ea3ca53550d0630c", | ||
url = "https://github.com/pgellert/avro/archive/98d753c48cf3a22b52adc688ea3ca53550d0630c.tar.gz", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is a placeholder till PR #313 gets merged, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct
return true; | ||
} | ||
|
||
auto expected_type = (r_leaf->type() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think whenever we make a copy, like here, we tend not to make it const
. It would be nice as a signal to say that this variable is immutable, but it can lead to accidental copies / prevent moving from the type, so I think, in general, we tend to default to auto
or (const) auto&
if we want to avoid a copy. @BenPope wdyt about this?
absl::flat_hash_set<incompatibility> expected; | ||
}; | ||
|
||
const auto compat_data = std::to_array<compat_test_case>({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
I find the color test cases a bit verbose. To help with readability, i would suggest:
- Parameterize the reader generation through a function that accepts the fields string.
- define the writer once.
- define the expected once and use where applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. The benefit of the verbosity here is that I can copy the test case and run it against a real schema registry for further testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid point. I am good either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|| r_leaf->leafAt(0)->type() != avro::Type::AVRO_NULL) { | ||
const auto& r_def = reader.defaultValueAt(int(r_idx)); | ||
|
||
auto missing_valid_default = [&]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think making this an IIFE improves readability here.
c983473
to
dac9a3e
Compare
first force-push (sorry for merging it together):
|
dac9a3e
to
05b5888
Compare
Make it easier to extend by simplifying the definition of a test case.
This adds more tests for the avro incompatibility `reader_field_missing_default_value`. All of these tests are already passing, while the tests added in the following commit show the change in behaviour.
05b5888
to
736f5a8
Compare
Force-pushed:
|
This fixes two related bugs in the avro compatibility checks where our implementation was too lenient for the check `reader_field_missing_default_value`. In Avro, a record field may declare a null default (`"default": null`) which is different from not declaring a default (no `"default"`). The first case means that by default the reader will set the field to `null` unless the field was set by the writer. The second case means that the writer must declare the field, otherwise the data is invalid for the reader. https://avro.apache.org/docs/1.11.1/specification/#schema-record However, the C++ avro implementation used a null object to represent even missing fields, which meant that the redpanda schema registry implementation could not tell the difference between a missing field and a null field. (See the associated avro fix PR). This meant that we incorrectly didn't detect when the reader schema had a new record field without a default value and always thought that a `null` default value is present. This surfaced as a bug for `"null"` and `["null", ...]` (union) types but it was constrained to these two types because the default-value-vs-field type mismatch check caught this as an incompatibility otherwise.
736f5a8
to
d4ec747
Compare
force-push: point to merged avro commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still some compilation errors?
https://github.com/redpanda-data/vtools/pull/3244 needs to merge first for this PR to be able to pass CI, but that PR depends on the changes in this PR so they should be reviewed together. |
The merge-base changed after approval.
Closing since a different implementation was chosen instead: #24032 |
This fixes two related bugs in the avro compatibility checks where our
implementation was too lenient for the check
reader_field_missing_default_value
.In Avro, a record field may declare a null default (
"default": null
)which is different from not declaring a default (no
"default"
). Thefirst case means that by default the reader will set the field to
null
unless the field was set by the writer. The second case means that the
writer must declare the field, otherwise the data is invalid for the
reader.
https://avro.apache.org/docs/1.11.1/specification/#schema-record
However, the C++ avro implementation used a null object to represent
even missing fields, which meant that the redpanda schema registry
implementation could not tell the difference between a missing field and
a null field. (See the associated avro fix PR).
This meant that we incorrectly didn't detect when the reader schema had
a new record field without a default value and always thought that a
null
default value is present. This surfaced as a bug for"null"
and["null", ...]
(union) types but it was constrained to these two typesbecause the default-value-vs-field type mismatch check caught this as an
incompatibility otherwise.
Depends on: redpanda-data/avro#313
Fixes https://redpandadata.atlassian.net/browse/CORE-1796
Fixes #16649
Backports Required
Release Notes
Bug Fixes
reader_field_missing_default_value
where it was too lenient for missing default values of null-able types.