Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make format validate by default #1553

Merged
merged 8 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions adr/2024-11-2-assertion-format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# [short title of solved problem and solution]

* Status: proposed
<!-- will update below to only those who participated in the vote -->
* Deciders: @gregsdennis @jdesrosiers @jviotti @mwadams @karenetheridge @awwright @benjam @relequestual
* Date: 2024-11-02
* Technical Story: https://github.com/json-schema-org/json-schema-spec/issues/1520
* Voting issue: https://github.com/json-schema-org/TSC/issues/19

## Context and Problem Statement

There's a long and sticky history around format.

1. Going back all the way to Draft 01, format has never required validation.
2. Whether to support format validation has always been the decision of the implementation.
3. The extent to which formats are validated has also been the decision of the implementation.

The result of all of this is that implementation support for validation has been spotty at best. Despite the JSON Schema specs referencing very concretely defined formats (by referencing other specs), implementations that do support validation don't all support each format equally. This has been the primary driving force behind keeping format as an opt-in validation.

With 2019-09, we decided that it was time to give the option of format validation to the schema author. They could enable validation by using a meta-schema which listed the Format Vocabulary with a true value, which meant, "format validation is required to process this schema."

In 2020-12, we further refined this by offering two separate vocabularies, one that treats the keyword as an annotation and one that treats it as an assertion. The argument was that the behavior of a keyword shouldn't change based on whether the vocabulary was required or not.

However, the fact remains that our users consistently report (via questions in Slack, GitHub, and StackOverflow) that they expect format to validate. (The most recent case I can think of was only last week, in .Net's effort to build a short-term solution for schema generation from types.)

Due to this consistency in user expectations have decided to:
gregsdennis marked this conversation as resolved.
Show resolved Hide resolved

1. make format an assertion keyword and strictly,
gregsdennis marked this conversation as resolved.
Show resolved Hide resolved
2. enforce it by moving the appropriate tests into the required section of the Test Suite.

## Decision Drivers

* User expectation
* Current behavior
* Historical context
* Disparity of current implementation support vs the proposed requirements

## Considered Options

### `format` remains an annotation keyword by default

This is the current state. The primary benefit is that we don't need to make a breaking change.

The primary downside is that the current system of (1) configuring the tool or (2) incluing the `format-assertion` vocab[^1] is confusing for many and doesn't align with user expectations.

[^1] The `format-assertion` vocabulary will no longer be an option since we have demoted vocabularies to a proposal for the stable release. This leaves tool configuration as the only option to enable `format` validation.

### `format` becomes an assertion keyword by default

We change the spec to require `format` validation. Furthermore:

* Implementations SHOULD support `format` with the defined values
* Implementations MAY support others, but only by explicit config
* Implementations MUST refuse to process a schema that contains an unsupported format

## Decision Outcome

The TSC has decided via vote (see voting issue above) that we should change `format` to act as an assertion by default, in line with option (2).

### Positive Consequences <!-- optional -->

* Aligns with user expectations.
* Users are still able to have purely annotative behavior through use of something like `x-format`.
* Increased consistency for `format` validation across implementations.

### Negative Consequences <!-- optional -->

* This is a breaking change, which means that we will likely have to re-educate our users.
* The burden on implementations will be greater since format validation was previously optional.

## Links <!-- optional -->

* [Link type] [Link to ADR] <!-- example: Refined by [ADR-0005](0005-example.md) -->
* … <!-- numbers of links can vary -->
108 changes: 44 additions & 64 deletions specs/jsonschema-validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,75 +293,60 @@ Structural validation alone may be insufficient to allow an application to
correctly utilize certain values. The `format` annotation keyword is defined to
allow schema authors to convey semantic information for a fixed subset of values
which are accurately described by authoritative resources, be they RFCs or other
external specifications.
external specifications. Format values defined externally to this document
SHOULD also be based on such authoritative resources in order to foster
interoperability.

The value of this keyword is called a format attribute. It MUST be a string. A
format attribute can generally only validate a given set of instance types. If
the type of the instance to validate is not in this set, validation for this
format attribute and instance SHOULD succeed. All format attributes defined in
this section apply to strings, but a format attribute can be specified to apply
to any instance types defined in the data model defined in the [core JSON
Schema.](#json-schema)[^1]
The value of this keyword MUST be a string. While this keyword can validate any
type, each distinct value will generally only validate a given set of instance
types. If the type of the instance to validate is not in this set, validation
for this keyword SHOULD succeed. All format values defined in this section apply
to strings, but a format value can be specified to apply to any instance types
defined in the data model defined in the [core JSON Schema](#json-schema)[^1].
gregsdennis marked this conversation as resolved.
Show resolved Hide resolved

[^1]: Note that the `type` keyword in this specification defines an "integer"
type which is not part of the data model. Therefore a format attribute can be
limited to numbers, but not specifically to integers. However, a numeric format
can be used alongside the `type` keyword with a value of "integer", or could be
explicitly defined to always pass if the number is not an integer, which
can be used alongside the `type` keyword with a value of "integer", or it could
be explicitly defined to always pass if the number is not an integer, which
produces essentially the same behavior as only applying to integers.

Implementing support for `format` as an annotation is REQUIRED (if the
implementation supports annotation collection).

Implementing support for `format` as an assertion is OPTIONAL. Implementations
which choose to support assertion behavior:

- MUST still collect the keyword's value as an annotation (if the implementation
supports annotation collection),
- MUST provide a configuration option to enable assertion behavior, defaulting
to annotation-only behavior
- SHOULD provide an implementation-specific best effort validation for each
format attribute defined below;[^3]
- MAY choose to implement validation of any or all format attributes as a no-op
by always producing a validation result of true;[^4]
- SHOULD use a common parsing library for each format, or a well-known regular
expression;
Implementations SHOULD provide assertion behavior for the format values defined
by this document[^2] and MUST refuse to process any schema which contains an
unsupported format value.
gregsdennis marked this conversation as resolved.
Show resolved Hide resolved

[^2]: Assertion behavior is called out very explicitly because it is a departure
from previous iterations of this specification. Previously, `format` was an
annotation-only keyword by default and implementations that supported assertion
were required to offer some configuration that allowed users to explicitly
enable assertion. Assertion is now a requirement in order to meet user
expectations. See [json-schema-org/json-schema-spec #1520](https://github.com/json-schema-org/json-schema-spec/issues/1520) for more.

In addition to the assertion behavior, this keyword also produces its value as
an annotation.

Implementations:

- SHOULD provide validation for each format attribute defined in this
document;
- MAY support format values not defined in this document, but such support MUST
be configurable and disabled by default;
- SHOULD use a common parsing library or a well-known regular expression for
each format;
- SHOULD clearly document how and to what degree each format attribute is
validated.

[^3]: The expectation is that for simple formats such as date-time, syntactic
validation will be thorough. For a complex format such as email addresses, which
are the amalgamation of various standards and numerous adjustments over time,
with obscure and/or obsolete rules that may or may not be restricted by other
applications making use of the value, a minimal validation is sufficient. For
example, an instance string that does not contain an "@" is clearly not a valid
email address, and an "email" or "hostname" containing characters outside of
7-bit ASCII is likewise clearly invalid.

[^4]: This matches the current reality of implementations, which provide widely
varying levels of validation, including no validation at all, for some or all
format attributes. It is also designed to encourage relying only on the
annotation behavior and performing semantic validation in the application, which
is the recommended best practice.

The requirement for minimal validation of format attributes is
intentionally vague and permissive, due to the complexity involved in many of
the attributes. Note in particular that the requirement is limited to syntactic
checking; it is not to be expected that an implementation would send an email,
attempt to connect to a URL, or otherwise check the existence of an entity
identified by a format instance.
The requirement for validation of format values in general is limited to
syntactic checking; implementations SHOULD NOT attempt to send an email, connect
to a URL, or otherwise check the existence of an entity identified by a format
instance.

#### Custom format attributes

Implementations MAY support custom format attributes. Save for agreement between
parties, schema authors SHALL NOT expect a peer implementation to support such
custom format attributes.
gregsdennis marked this conversation as resolved.
Show resolved Hide resolved

An implementation MUST NOT fail to collect unknown formats as annotations.

When configured for assertion behavior for `format`, implementations MUST fail
upon encountering unknown formats.

### Defined Formats

#### Dates, Times, and Duration
Expand All @@ -372,22 +357,17 @@ Date and time format names are derived from [RFC 3339, section 5.6](#rfc3339).
The duration format is from the ISO 8601 ABNF as given in Appendix A of RFC
3339.

Implementations supporting formats SHOULD implement support for the following
attributes:

- *date-time:* A string instance is valid against this attribute if it is a
- *date-time*: A string instance is valid against this attribute if it is a
valid representation according to the "date-time" ABNF rule (referenced above)
- *date:* A string instance is valid against this attribute if it is a valid
- *date*: A string instance is valid against this attribute if it is a valid
representation according to the "full-date" ABNF rule (referenced above)
- *time:* A string instance is valid against this attribute if it is a valid
- *time*: A string instance is valid against this attribute if it is a valid
representation according to the "full-time" ABNF rule (referenced above)
- *duration:* A string instance is valid against this attribute if it is a valid
- *duration*: A string instance is valid against this attribute if it is a valid
representation according to the "duration" ABNF rule (referenced above)

Implementations MAY support additional attributes using the other format names
defined anywhere in that RFC. If "full-date" or "full-time" are implemented, the
corresponding short form ("date" or "time" respectively) MUST be implemented,
and MUST behave identically. Implementations SHOULD NOT define extension
defined anywhere in that RFC. Implementations SHOULD NOT define extension
attributes with any name matching an RFC 3339 format unless it validates
according to the rules of that format.[^5]

Expand All @@ -401,7 +381,7 @@ likely either be promoted to fully specified attributes or dropped.

These attributes apply to string instances.

A string instance is valid against these attributes if it is a valid Internet
A string instance is valid against these format values if it is a valid Internet
email address as follows:

- *email:* As defined by the "Mailbox" ABNF rule in [RFC 5321, section
Expand Down Expand Up @@ -489,7 +469,7 @@ A regular expression, which SHOULD be valid according to the
[ECMA-262](#ecma262) regular expression dialect.

Implementations that validate formats MUST accept at least the subset of
ECMA-262 defined in {{regexinterop}}), and SHOULD accept all valid ECMA-262
ECMA-262 defined in {{regexinterop}}, and SHOULD accept all valid ECMA-262
expressions.

## Keywords for the Contents of String-Encoded Data {#content}
Expand Down
Loading