Skip to content

Commit

Permalink
sr/avro: fix missing default compat check for null
Browse files Browse the repository at this point in the history
This fixes two related bugs in the avro compatibility checks where our
implementation was too lenient for the check
`reader_field_missing_default_value`.

In Avro, a record field may declare a null default (`"default": null`)
which is different from not declaring a default (no `"default"`). The
first case means that by default the reader will set the field to `null`
unless the field was set by the writer. The second case means that the
writer must declare the field, otherwise the data is invalid for the
reader.

https://avro.apache.org/docs/1.11.1/specification/#schema-record

However, the C++ avro implementation used a null object to represent
even missing fields, which meant that the redpanda schema registry
implementation could not tell the difference between a missing field and
a null field. (See the associated avro fix PR).

This meant that we incorrectly didn't detect when the reader schema had
a new record field without a default value and always thought that a
`null` default value is present. This surfaced as a bug for `"null"` and
`["null", ...]` (union) types but it was constrained to these two types
because the default-value-vs-field type mismatch check caught this as an
incompatibility otherwise.
  • Loading branch information
pgellert committed Oct 31, 2024
1 parent 5b47c20 commit d4ec747
Show file tree
Hide file tree
Showing 5 changed files with 67 additions and 21 deletions.
8 changes: 4 additions & 4 deletions MODULE.bazel.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions bazel/repositories.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ def data_dependency():
http_archive(
name = "avro",
build_file = "//bazel/thirdparty:avro.BUILD",
sha256 = "f1a7d13b28ce5cc8812f26c705a6ea27b8bc63554d82d556c63b437da0338cf1",
strip_prefix = "avro-e54bf712fce903652f3eab7a6c16264ac5d17285",
url = "https://github.com/redpanda-data/avro/archive/e54bf712fce903652f3eab7a6c16264ac5d17285.tar.gz",
sha256 = "989171347c4fedb74e2758eaaf234f40aa065d25e738b7bb0f55f82befb9322c",
strip_prefix = "avro-2e73edda63cdaf4611e5fe19d4bcf46e8e3c7d27",
url = "https://github.com/redpanda-data/avro/archive/2e73edda63cdaf4611e5fe19d4bcf46e8e3c7d27.tar.gz",
patches = ["//bazel/thirdparty:avro-snappy-includes.patch"],
patch_args = ["-p1"],
)
Expand Down
6 changes: 4 additions & 2 deletions src/v/iceberg/schema_avro.cc
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ avro::Schema field_to_avro(const nested_field& field) {

avro::Schema
struct_type_to_avro(const struct_type& type, std::string_view name) {
static const std::optional<avro::GenericDatum> no_default = std::nullopt;
avro::RecordSchema avro_schema(std::string{name});
const auto& fields = type.fields;
for (const auto& field_ptr : fields) {
Expand All @@ -203,10 +204,11 @@ struct_type_to_avro(const struct_type& type, std::string_view name) {
field_to_avro(*field_ptr),
field_attrs(field_ptr->id()),
// Optional fields (unions types with null children) have
// defaults of GenericUnion(null), rather than no default.
// defaults of GenericUnion(null) with the same union type as the
// child_schema, rather than no default.
// NOTE: no other union types are allowed by Iceberg.
field_ptr->required
? avro::GenericDatum()
? no_default
: avro::GenericDatum(
child_schema.root(), avro::GenericUnion(child_schema.root())));
}
Expand Down
27 changes: 15 additions & 12 deletions src/v/pandaproxy/schema_registry/avro.cc
Original file line number Diff line number Diff line change
Expand Up @@ -116,18 +116,21 @@ avro_compatibility_result check_compatible(
*reader.leafAt(int(r_idx)),
*writer.leafAt(int(w_idx)),
fields_p / std::to_string(r_idx) / "type"));
} else if (
reader.defaultValueAt(int(r_idx)).type() == avro::AVRO_NULL) {
// if the reader's record schema has a field with no default
// value, and writer's schema does not have a field with the
// same name, an error is signalled.

// For union, the default must correspond to the first type.
// The default may be null.
} else {
// if the reader's record schema has a field with no
// default value, and writer's schema does not have a
// field with the same name, an error is signalled.

// For union, the default must correspond to the first
// type. The default may be null.
const auto& r_leaf = reader.leafAt(int(r_idx));
if (
r_leaf->type() != avro::Type::AVRO_UNION
|| r_leaf->leafAt(0)->type() != avro::Type::AVRO_NULL) {
const auto& r_def = reader.defaultValueAt(int(r_idx));

auto expected_type = (r_leaf->type()
== avro::Type::AVRO_UNION)
? r_leaf->leafAt(0)->type()
: r_leaf->type();
if (!r_def || r_def->type() != expected_type) {
compat_result.emplace<avro_incompatibility>(
fields_p / std::to_string(r_idx),
avro_incompatibility::Type::
Expand All @@ -140,7 +143,7 @@ avro_compatibility_result check_compatible(
// if the writer's symbol is not present in the reader's enum and
// the reader has a default value, then that value is used,
// otherwise an error is signalled.
if (reader.defaultValueAt(0).type() == avro::AVRO_NULL) {
if (!reader.defaultValueAt(0)) {
std::vector<std::string_view> missing;
for (size_t w_idx = 0; w_idx < writer.names(); ++w_idx) {
size_t r_idx{0};
Expand Down
41 changes: 41 additions & 0 deletions src/v/pandaproxy/schema_registry/test/compatibility_avro.cc
Original file line number Diff line number Diff line change
Expand Up @@ -543,6 +543,28 @@ const auto compat_data = std::to_array<compat_test_case>({
.writer=schema_old,
.expected=backward_expected,
},
{
.reader=R"({
"type": "record",
"name": "car",
"fields": [
{
"name": "color",
"type": ["null", "string"]
}
]
})",
.writer=R"({
"type": "record",
"name": "car",
"fields": []
})",
.expected={
{"/fields/0",
incompatibility::Type::reader_field_missing_default_value,
"color"},
},
},
{
.reader=R"({
"type": "record",
Expand All @@ -565,6 +587,25 @@ const auto compat_data = std::to_array<compat_test_case>({
"color"},
},
},
{
.reader=R"({
"type": "record",
"name": "car",
"fields": [
{
"name": "color",
"type": "null",
"default": null
}
]
})",
.writer=R"({
"type": "record",
"name": "car",
"fields": []
})",
.expected = {},
},
{
.reader=R"({
"type": "record",
Expand Down

0 comments on commit d4ec747

Please sign in to comment.