Skip to content

Commit

Permalink
sr/avro: fix missing default compat check for null
Browse files Browse the repository at this point in the history
This fixes a bug in the avro compatibility check for fields present in
the reader but missing from the writer. In this case the writer and the
reader are compatible only if the reader field has a specified default
value.

The Avro library sets the default value to `GenericDatum()` which has
type `AVRO_NULL` when the default is not specified. This is what we are
trying to detect in this check. However, the check was confusing an
explicit null default for a union-type as a missing value. The former is
represented as `GenericDatum(Union(Null))` and because of the way
`GenericDatum::type()` delegates to the type of the first union leaf
value when `GenericDatum` contains a union, the check thought that the
value was missing, when it was in fact specified as null.

To solve for the above, we also check that the type is not a union type
to be able to detect when the default is not set. (The same check is
used in the Avro library to detect unset default values, for example,
when printing an Avro schema as JSON.)

The caveat is that we are still overly restrictive for fields of type
null, because in that case, the default value will be `GenericDatum()`
regardless of whether the default value was explicitly set to null or
not. This is a limitation of the Avro library's API and could only be
fixed by making breaking changes to the Avro library's API.

Also note that the checks against `reader.leafAt(...)` have been
removed. They were attempting to compare the type of the field against
the type of the default value. This is not necessary because this is
ensured by the Avro library's built-in schema validation done while
parsing the Avro schema.

(cherry picked from commit 3866b1c)
  • Loading branch information
pgellert authored and vbotbuildovich committed Dec 4, 2024
1 parent c164f26 commit 5fe3c0b
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 12 deletions.
35 changes: 23 additions & 12 deletions src/v/pandaproxy/schema_registry/avro.cc
Original file line number Diff line number Diff line change
Expand Up @@ -115,18 +115,29 @@ avro_compatibility_result check_compatible(
*reader.leafAt(int(r_idx)),
*writer.leafAt(int(w_idx)),
fields_p / std::to_string(r_idx) / "type"));
} else if (
reader.defaultValueAt(int(r_idx)).type() == avro::AVRO_NULL) {
// if the reader's record schema has a field with no default
// value, and writer's schema does not have a field with the
// same name, an error is signalled.

// For union, the default must correspond to the first type.
// The default may be null.
const auto& r_leaf = reader.leafAt(int(r_idx));
if (
r_leaf->type() != avro::Type::AVRO_UNION
|| r_leaf->leafAt(0)->type() != avro::Type::AVRO_NULL) {
} else {
// if the reader's record schema has a field with no
// default value, and writer's schema does not have a
// field with the same name, an error is signalled.
// For union, the default must correspond to the first
// type.
const auto& def = reader.defaultValueAt(int(r_idx));

// Note: this code is overly restrictive for null-type
// fields with null defaults. This is because the Avro API
// is not expressive enough to differentiate the two.
// Union type field's default set to null:
// def=GenericDatum(Union(Null))
// Union type field's default missing:
// def=GenericDatum(Null)
// Null type field's default set to null:
// def=GenericDatum(Null)
// Null type field's default missing:
// def=GenericDatum(Null)
auto default_unset = !def.isUnion()
&& def.type() == avro::AVRO_NULL;

if (default_unset) {
compat_result.emplace<avro_incompatibility>(
fields_p / std::to_string(r_idx),
avro_incompatibility::Type::
Expand Down
22 changes: 22 additions & 0 deletions src/v/pandaproxy/schema_registry/test/compatibility_avro.cc
Original file line number Diff line number Diff line change
Expand Up @@ -543,6 +543,28 @@ const auto compat_data = std::to_array<compat_test_case>({
.writer=schema_old,
.expected=backward_expected,
},
{
.reader=R"({
"type": "record",
"name": "car",
"fields": [
{
"name": "color",
"type": ["null", "string"]
}
]
})",
.writer=R"({
"type": "record",
"name": "car",
"fields": []
})",
.expected={
{"/fields/0",
incompatibility::Type::reader_field_missing_default_value,
"color"},
},
},
{
.reader=R"({
"type": "record",
Expand Down

0 comments on commit 5fe3c0b

Please sign in to comment.