Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sr/avro: fix missing default compat check for null #23839

Closed

Conversation

pgellert
Copy link
Contributor

This fixes two related bugs in the avro compatibility checks where our
implementation was too lenient for the check
reader_field_missing_default_value.

In Avro, a record field may declare a null default ("default": null)
which is different from not declaring a default (no "default"). The
first case means that by default the reader will set the field to null
unless the field was set by the writer. The second case means that the
writer must declare the field, otherwise the data is invalid for the
reader.

https://avro.apache.org/docs/1.11.1/specification/#schema-record

However, the C++ avro implementation used a null object to represent
even missing fields, which meant that the redpanda schema registry
implementation could not tell the difference between a missing field and
a null field. (See the associated avro fix PR).

This meant that we incorrectly didn't detect when the reader schema had
a new record field without a default value and always thought that a
null default value is present. This surfaced as a bug for "null" and
["null", ...] (union) types but it was constrained to these two types
because the default-value-vs-field type mismatch check caught this as an
incompatibility otherwise.

Depends on: redpanda-data/avro#313
Fixes https://redpandadata.atlassian.net/browse/CORE-1796
Fixes #16649

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Bug Fixes

  • Schema Registry: fixes a bug in the Avro compatibility check reader_field_missing_default_value where it was too lenient for missing default values of null-able types.

@pgellert pgellert requested a review from a team October 18, 2024 15:28
@pgellert pgellert self-assigned this Oct 18, 2024
@pgellert pgellert requested review from IoannisRP and BenPope and removed request for a team October 18, 2024 15:28
Comment on lines 26 to 28
sha256 = "f1a7d13b28ce5cc8812f26c705a6ea27b8bc63554d82d556c63b437da0338cf1",
strip_prefix = "avro-e54bf712fce903652f3eab7a6c16264ac5d17285",
url = "https://github.com/redpanda-data/avro/archive/e54bf712fce903652f3eab7a6c16264ac5d17285.tar.gz",
sha256 = "3f6f1e32225b94f3bd70822b7e819c61b30570417308ae7a9d7661d8016be99e",
strip_prefix = "avro-98d753c48cf3a22b52adc688ea3ca53550d0630c",
url = "https://github.com/pgellert/avro/archive/98d753c48cf3a22b52adc688ea3ca53550d0630c.tar.gz",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is a placeholder till PR #313 gets merged, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct

return true;
}

auto expected_type = (r_leaf->type()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think whenever we make a copy, like here, we tend not to make it const. It would be nice as a signal to say that this variable is immutable, but it can lead to accidental copies / prevent moving from the type, so I think, in general, we tend to default to auto or (const) auto& if we want to avoid a copy. @BenPope wdyt about this?

absl::flat_hash_set<incompatibility> expected;
};

const auto compat_data = std::to_array<compat_test_case>({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
I find the color test cases a bit verbose. To help with readability, i would suggest:

  • Parameterize the reader generation through a function that accepts the fields string.
  • define the writer once.
  • define the expected once and use where applicable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. The benefit of the verbosity here is that I can copy the test case and run it against a real schema registry for further testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point. I am good either way.

Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

|| r_leaf->leafAt(0)->type() != avro::Type::AVRO_NULL) {
const auto& r_def = reader.defaultValueAt(int(r_idx));

auto missing_valid_default = [&]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think making this an IIFE improves readability here.

@pgellert pgellert force-pushed the fix/sr-avro-default-added branch 2 times, most recently from c983473 to dac9a3e Compare October 23, 2024 18:27
@pgellert
Copy link
Contributor Author

first force-push (sorry for merging it together):

  • rebase to dev
  • unwrap IIFE based on feedback

second force-push:

  • rebase to dev again to fix a bazel conflict

@pgellert pgellert force-pushed the fix/sr-avro-default-added branch from dac9a3e to 05b5888 Compare October 29, 2024 14:31
Make it easier to extend by simplifying the definition of a test case.
This adds more tests for the avro incompatibility
`reader_field_missing_default_value`. All of these tests are already
passing, while the tests added in the following commit show the change
in behaviour.
@pgellert pgellert force-pushed the fix/sr-avro-default-added branch from 05b5888 to 736f5a8 Compare October 29, 2024 15:31
@pgellert
Copy link
Contributor Author

Force-pushed:

This fixes two related bugs in the avro compatibility checks where our
implementation was too lenient for the check
`reader_field_missing_default_value`.

In Avro, a record field may declare a null default (`"default": null`)
which is different from not declaring a default (no `"default"`). The
first case means that by default the reader will set the field to `null`
unless the field was set by the writer. The second case means that the
writer must declare the field, otherwise the data is invalid for the
reader.

https://avro.apache.org/docs/1.11.1/specification/#schema-record

However, the C++ avro implementation used a null object to represent
even missing fields, which meant that the redpanda schema registry
implementation could not tell the difference between a missing field and
a null field. (See the associated avro fix PR).

This meant that we incorrectly didn't detect when the reader schema had
a new record field without a default value and always thought that a
`null` default value is present. This surfaced as a bug for `"null"` and
`["null", ...]` (union) types but it was constrained to these two types
because the default-value-vs-field type mismatch check caught this as an
incompatibility otherwise.
@pgellert pgellert force-pushed the fix/sr-avro-default-added branch from 736f5a8 to d4ec747 Compare October 31, 2024 15:57
@pgellert
Copy link
Contributor Author

force-push: point to merged avro commit

@pgellert pgellert marked this pull request as ready for review October 31, 2024 16:01
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some compilation errors?

@pgellert
Copy link
Contributor Author

pgellert commented Nov 5, 2024

Still some compilation errors?

https://github.com/redpanda-data/vtools/pull/3244 needs to merge first for this PR to be able to pass CI, but that PR depends on the changes in this PR so they should be reviewed together.

@BenPope BenPope self-requested a review November 5, 2024 10:02
BenPope
BenPope previously approved these changes Nov 5, 2024
@pgellert pgellert dismissed BenPope’s stale review December 4, 2024 15:46

The merge-base changed after approval.

@pgellert
Copy link
Contributor Author

pgellert commented Dec 4, 2024

Closing since a different implementation was chosen instead: #24032

@pgellert pgellert closed this Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

avro backwards-compatibility check is too permissive
3 participants