From 9b19a36c84604786daaeae1128a05e203e63829e Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Mon, 28 Oct 2024 21:02:51 +1300 Subject: [PATCH 1/8] make format validate by default --- specs/jsonschema-validation.md | 90 +++++++++++++++------------------- 1 file changed, 40 insertions(+), 50 deletions(-) diff --git a/specs/jsonschema-validation.md b/specs/jsonschema-validation.md index c6d985ac..6bfe57a0 100644 --- a/specs/jsonschema-validation.md +++ b/specs/jsonschema-validation.md @@ -293,39 +293,46 @@ Structural validation alone may be insufficient to allow an application to correctly utilize certain values. The `format` annotation keyword is defined to allow schema authors to convey semantic information for a fixed subset of values which are accurately described by authoritative resources, be they RFCs or other -external specifications. +external specifications. Format values defined externally to this document +SHOULD also be based on such authoritative resources in order to foster +interoperability. -The value of this keyword is called a format attribute. It MUST be a string. A -format attribute can generally only validate a given set of instance types. If -the type of the instance to validate is not in this set, validation for this -format attribute and instance SHOULD succeed. All format attributes defined in -this section apply to strings, but a format attribute can be specified to apply -to any instance types defined in the data model defined in the [core JSON -Schema.](#json-schema)[^1] +The value of this keyword MUST be a string. While this keyword can validate any +type, each distinct value will generally only validate a given set of instance +types. If the type of the instance to validate is not in this set, validation +for this keyword SHOULD succeed. All format values defined in this section apply +to strings, but a format value can be specified to apply to any instance types +defined in the data model defined in the [core JSON Schema](#json-schema)[^1]. [^1]: Note that the `type` keyword in this specification defines an "integer" type which is not part of the data model. Therefore a format attribute can be limited to numbers, but not specifically to integers. However, a numeric format -can be used alongside the `type` keyword with a value of "integer", or could be -explicitly defined to always pass if the number is not an integer, which +can be used alongside the `type` keyword with a value of "integer", or it could +be explicitly defined to always pass if the number is not an integer, which produces essentially the same behavior as only applying to integers. -Implementing support for `format` as an annotation is REQUIRED (if the -implementation supports annotation collection). +Implementations SHOULD provide assertion behavior for the format values defined +by this document[^2] and MUST refuse to process any schema which contains an +unsupported format value. -Implementing support for `format` as an assertion is OPTIONAL. Implementations -which choose to support assertion behavior: +[^2]: Assertion behavior is called out very explicitly because it is a departure +from previous iterations of this specification. Previously, `format` was an +annotation-only keyword by default and implementations that supported assertion +were required to offer some configuration that allowed users to explicitly +enable assertion. Assertion is now a requirement in order to meet user +expectations. See [json-schema-org/json-schema-spec #1520](https://github.com/json-schema-org/json-schema-spec/issues/1520) for more. + +In addition to the assertion behavior, this keyword also produces its value as +an annotation. + +Implementations: -- MUST still collect the keyword's value as an annotation (if the implementation - supports annotation collection), -- MUST provide a configuration option to enable assertion behavior, defaulting - to annotation-only behavior - SHOULD provide an implementation-specific best effort validation for each - format attribute defined below;[^3] -- MAY choose to implement validation of any or all format attributes as a no-op - by always producing a validation result of true;[^4] -- SHOULD use a common parsing library for each format, or a well-known regular - expression; + format attribute defined in this document;[^3] +- MAY support format values not defined in this document, but such support MUST + be configurable and disabled by default; +- SHOULD use a common parsing library or a well-known regular expression for + each format; - SHOULD clearly document how and to what degree each format attribute is validated. @@ -338,18 +345,11 @@ example, an instance string that does not contain an "@" is clearly not a valid email address, and an "email" or "hostname" containing characters outside of 7-bit ASCII is likewise clearly invalid. -[^4]: This matches the current reality of implementations, which provide widely -varying levels of validation, including no validation at all, for some or all -format attributes. It is also designed to encourage relying only on the -annotation behavior and performing semantic validation in the application, which -is the recommended best practice. - -The requirement for minimal validation of format attributes is +The requirement for minimal validation of format values in general is intentionally vague and permissive, due to the complexity involved in many of the attributes. Note in particular that the requirement is limited to syntactic -checking; it is not to be expected that an implementation would send an email, -attempt to connect to a URL, or otherwise check the existence of an entity -identified by a format instance. +checking; implementations SHOULD NOT attempt to send an email, connect to a URL, +or otherwise check the existence of an entity identified by a format instance. #### Custom format attributes @@ -357,11 +357,6 @@ Implementations MAY support custom format attributes. Save for agreement between parties, schema authors SHALL NOT expect a peer implementation to support such custom format attributes. -An implementation MUST NOT fail to collect unknown formats as annotations. - -When configured for assertion behavior for `format`, implementations MUST fail -upon encountering unknown formats. - ### Defined Formats #### Dates, Times, and Duration @@ -372,22 +367,17 @@ Date and time format names are derived from [RFC 3339, section 5.6](#rfc3339). The duration format is from the ISO 8601 ABNF as given in Appendix A of RFC 3339. -Implementations supporting formats SHOULD implement support for the following -attributes: - -- *date-time:* A string instance is valid against this attribute if it is a +- *date-time*: A string instance is valid against this attribute if it is a valid representation according to the "date-time" ABNF rule (referenced above) -- *date:* A string instance is valid against this attribute if it is a valid +- *date*: A string instance is valid against this attribute if it is a valid representation according to the "full-date" ABNF rule (referenced above) -- *time:* A string instance is valid against this attribute if it is a valid +- *time*: A string instance is valid against this attribute if it is a valid representation according to the "full-time" ABNF rule (referenced above) -- *duration:* A string instance is valid against this attribute if it is a valid +- *duration*: A string instance is valid against this attribute if it is a valid representation according to the "duration" ABNF rule (referenced above) Implementations MAY support additional attributes using the other format names -defined anywhere in that RFC. If "full-date" or "full-time" are implemented, the -corresponding short form ("date" or "time" respectively) MUST be implemented, -and MUST behave identically. Implementations SHOULD NOT define extension +defined anywhere in that RFC. Implementations SHOULD NOT define extension attributes with any name matching an RFC 3339 format unless it validates according to the rules of that format.[^5] @@ -401,7 +391,7 @@ likely either be promoted to fully specified attributes or dropped. These attributes apply to string instances. -A string instance is valid against these attributes if it is a valid Internet +A string instance is valid against these format values if it is a valid Internet email address as follows: - *email:* As defined by the "Mailbox" ABNF rule in [RFC 5321, section @@ -489,7 +479,7 @@ A regular expression, which SHOULD be valid according to the [ECMA-262](#ecma262) regular expression dialect. Implementations that validate formats MUST accept at least the subset of -ECMA-262 defined in {{regexinterop}}), and SHOULD accept all valid ECMA-262 +ECMA-262 defined in {{regexinterop}}, and SHOULD accept all valid ECMA-262 expressions. ## Keywords for the Contents of String-Encoded Data {#content} From 3de61f21eb3aff04714dd703569583177b2a76f7 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Sat, 2 Nov 2024 15:06:29 +1300 Subject: [PATCH 2/8] add ADR for change in format behavior --- adr/2024-11-2-assertion-format.md | 74 +++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) create mode 100644 adr/2024-11-2-assertion-format.md diff --git a/adr/2024-11-2-assertion-format.md b/adr/2024-11-2-assertion-format.md new file mode 100644 index 00000000..397d5018 --- /dev/null +++ b/adr/2024-11-2-assertion-format.md @@ -0,0 +1,74 @@ +# [short title of solved problem and solution] + +* Status: proposed + +* Deciders: @gregsdennis @jdesrosiers @jviotti @mwadams @karenetheridge @awwright @benjam @relequestual +* Date: 2024-11-02 +* Technical Story: https://github.com/json-schema-org/json-schema-spec/issues/1520 +* Voting issue: https://github.com/json-schema-org/TSC/issues/19 + +## Context and Problem Statement + +There's a long and sticky history around format. + +1. Going back all the way to Draft 01, format has never required validation. +2. Whether to support format validation has always been the decision of the implementation. +3. The extent to which formats are validated has also been the decision of the implementation. + +The result of all of this is that implementation support for validation has been spotty at best. Despite the JSON Schema specs referencing very concretely defined formats (by referencing other specs), implementations that do support validation don't all support each format equally. This has been the primary driving force behind keeping format as an opt-in validation. + +With 2019-09, we decided that it was time to give the option of format validation to the schema author. They could enable validation by using a meta-schema which listed the Format Vocabulary with a true value, which meant, "format validation is required to process this schema." + +In 2020-12, we further refined this by offering two separate vocabularies, one that treats the keyword as an annotation and one that treats it as an assertion. The argument was that the behavior of a keyword shouldn't change based on whether the vocabulary was required or not. + +However, the fact remains that our users consistently report (via questions in Slack, GitHub, and StackOverflow) that they expect format to validate. (The most recent case I can think of was only last week, in .Net's effort to build a short-term solution for schema generation from types.) + +Due to this consistency in user expectations have decided to: + +1. make format an assertion keyword and strictly, +2. enforce it by moving the appropriate tests into the required section of the Test Suite. + +## Decision Drivers + +* User expectation +* Current behavior +* Historical context +* Disparity of current implementation support vs the proposed requirements + +## Considered Options + +### `format` remains an annotation keyword by default + +This is the current state. The primary benefit is that we don't need to make a breaking change. + +The primary downside is that the current system of (1) configuring the tool or (2) incluing the `format-assertion` vocab[^1] is confusing for many and doesn't align with user expectations. + +[^1] The `format-assertion` vocabulary will no longer be an option since we have demoted vocabularies to a proposal for the stable release. This leaves tool configuration as the only option to enable `format` validation. + +### `format` becomes an assertion keyword by default + +We change the spec to require `format` validation. Furthermore: + +* Implementations SHOULD support `format` with the defined values +* Implementations MAY support others, but only by explicit config +* Implementations MUST refuse to process a schema that contains an unsupported format + +## Decision Outcome + +The TSC has decided via vote (see voting issue above) that we should change `format` to act as an assertion by default, in line with option (2). + +### Positive Consequences + +* Aligns with user expectations. +* Users are still able to have purely annotative behavior through use of something like `x-format`. +* Increased consistency for `format` validation across implementations. + +### Negative Consequences + +* This is a breaking change, which means that we will likely have to re-educate our users. +* The burden on implementations will be greater since format validation was previously optional. + +## Links + +* [Link type] [Link to ADR] +* … From e13e15695ce44bd120fc059bd7bda7b759110a92 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Thu, 7 Nov 2024 16:14:29 +1300 Subject: [PATCH 3/8] removed permissive language from format requirements --- specs/jsonschema-validation.md | 22 ++++++---------------- 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/specs/jsonschema-validation.md b/specs/jsonschema-validation.md index 6bfe57a0..7ddee49e 100644 --- a/specs/jsonschema-validation.md +++ b/specs/jsonschema-validation.md @@ -327,8 +327,8 @@ an annotation. Implementations: -- SHOULD provide an implementation-specific best effort validation for each - format attribute defined in this document;[^3] +- SHOULD provide validation for each format attribute defined in this + document; - MAY support format values not defined in this document, but such support MUST be configurable and disabled by default; - SHOULD use a common parsing library or a well-known regular expression for @@ -336,20 +336,10 @@ Implementations: - SHOULD clearly document how and to what degree each format attribute is validated. -[^3]: The expectation is that for simple formats such as date-time, syntactic -validation will be thorough. For a complex format such as email addresses, which -are the amalgamation of various standards and numerous adjustments over time, -with obscure and/or obsolete rules that may or may not be restricted by other -applications making use of the value, a minimal validation is sufficient. For -example, an instance string that does not contain an "@" is clearly not a valid -email address, and an "email" or "hostname" containing characters outside of -7-bit ASCII is likewise clearly invalid. - -The requirement for minimal validation of format values in general is -intentionally vague and permissive, due to the complexity involved in many of -the attributes. Note in particular that the requirement is limited to syntactic -checking; implementations SHOULD NOT attempt to send an email, connect to a URL, -or otherwise check the existence of an entity identified by a format instance. +The requirement for validation of format values in general is limited to +syntactic checking; implementations SHOULD NOT attempt to send an email, connect +to a URL, or otherwise check the existence of an entity identified by a format +instance. #### Custom format attributes From 81d64babc720d923bdbfff3210b8e3208de0a282 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Wed, 13 Nov 2024 13:38:15 +1300 Subject: [PATCH 4/8] Update adr/2024-11-2-assertion-format.md Co-authored-by: Jason Desrosiers --- adr/2024-11-2-assertion-format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/adr/2024-11-2-assertion-format.md b/adr/2024-11-2-assertion-format.md index 397d5018..f8dc08fa 100644 --- a/adr/2024-11-2-assertion-format.md +++ b/adr/2024-11-2-assertion-format.md @@ -23,7 +23,7 @@ In 2020-12, we further refined this by offering two separate vocabularies, one t However, the fact remains that our users consistently report (via questions in Slack, GitHub, and StackOverflow) that they expect format to validate. (The most recent case I can think of was only last week, in .Net's effort to build a short-term solution for schema generation from types.) -Due to this consistency in user expectations have decided to: +Due to this consistency in user expectations, we have decided to: 1. make format an assertion keyword and strictly, 2. enforce it by moving the appropriate tests into the required section of the Test Suite. From 16a751563d83d1edca59375f159f3472fcbf4153 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Wed, 13 Nov 2024 13:39:44 +1300 Subject: [PATCH 5/8] Update specs/jsonschema-validation.md Co-authored-by: Jason Desrosiers --- specs/jsonschema-validation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/jsonschema-validation.md b/specs/jsonschema-validation.md index 7ddee49e..36fe5d8d 100644 --- a/specs/jsonschema-validation.md +++ b/specs/jsonschema-validation.md @@ -302,7 +302,7 @@ type, each distinct value will generally only validate a given set of instance types. If the type of the instance to validate is not in this set, validation for this keyword SHOULD succeed. All format values defined in this section apply to strings, but a format value can be specified to apply to any instance types -defined in the data model defined in the [core JSON Schema](#json-schema)[^1]. +defined in the data model defined in the [core JSON Schema](#json-schema) specification[^1]. [^1]: Note that the `type` keyword in this specification defines an "integer" type which is not part of the data model. Therefore a format attribute can be From 1b0e4bb38fb5ed86ec8df654059fe4da63a746f1 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Wed, 13 Nov 2024 13:41:40 +1300 Subject: [PATCH 6/8] Update specs/jsonschema-validation.md Co-authored-by: Jason Desrosiers --- specs/jsonschema-validation.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/specs/jsonschema-validation.md b/specs/jsonschema-validation.md index 36fe5d8d..e76106a2 100644 --- a/specs/jsonschema-validation.md +++ b/specs/jsonschema-validation.md @@ -341,11 +341,11 @@ syntactic checking; implementations SHOULD NOT attempt to send an email, connect to a URL, or otherwise check the existence of an entity identified by a format instance. -#### Custom format attributes +#### Custom format values -Implementations MAY support custom format attributes. Save for agreement between +Implementations MAY support custom format values. Save for agreement between parties, schema authors SHALL NOT expect a peer implementation to support such -custom format attributes. +custom format values. ### Defined Formats From ff8f074d7f782a9a9e2fce0e6a8e707c4644c853 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Wed, 13 Nov 2024 14:45:55 +1300 Subject: [PATCH 7/8] update adr with results of vote --- adr/2024-11-2-assertion-format.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/adr/2024-11-2-assertion-format.md b/adr/2024-11-2-assertion-format.md index f8dc08fa..22a36010 100644 --- a/adr/2024-11-2-assertion-format.md +++ b/adr/2024-11-2-assertion-format.md @@ -2,10 +2,13 @@ * Status: proposed -* Deciders: @gregsdennis @jdesrosiers @jviotti @mwadams @karenetheridge @awwright @benjam @relequestual +* Deciders: @gregsdennis @jdesrosiers @julian @jviotti @mwadams @karenetheridge @relequestual * Date: 2024-11-02 * Technical Story: https://github.com/json-schema-org/json-schema-spec/issues/1520 * Voting issue: https://github.com/json-schema-org/TSC/issues/19 + For - @gregsdennis @jdesrosiers @jviotti @mwadams @karenetheridge + Neutral - @relequestual + Against - @julian ## Context and Problem Statement @@ -25,8 +28,8 @@ However, the fact remains that our users consistently report (via questions in S Due to this consistency in user expectations, we have decided to: -1. make format an assertion keyword and strictly, -2. enforce it by moving the appropriate tests into the required section of the Test Suite. +1. make format an assertion keyword, and +2. strictly enforce it by moving the appropriate tests into the required section of the Test Suite and building them more completely. ## Decision Drivers @@ -65,7 +68,8 @@ The TSC has decided via vote (see voting issue above) that we should change `for ### Negative Consequences -* This is a breaking change, which means that we will likely have to re-educate our users. +* This is a breaking change, which means that we will likely have to re-educate the users who correctly treat it as an annotation. +* Older schemas which do not specify a version (`$schema`) may change their validation outcome. * The burden on implementations will be greater since format validation was previously optional. ## Links From 8e33d21e95c05505fe1d1c77b6593fc43a32a226 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Thu, 14 Nov 2024 09:06:13 +1300 Subject: [PATCH 8/8] Update specs/jsonschema-validation.md Co-authored-by: Jason Desrosiers --- specs/jsonschema-validation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/specs/jsonschema-validation.md b/specs/jsonschema-validation.md index e76106a2..be9f15b4 100644 --- a/specs/jsonschema-validation.md +++ b/specs/jsonschema-validation.md @@ -312,8 +312,8 @@ be explicitly defined to always pass if the number is not an integer, which produces essentially the same behavior as only applying to integers. Implementations SHOULD provide assertion behavior for the format values defined -by this document[^2] and MUST refuse to process any schema which contains an -unsupported format value. +by this document[^2] and MUST refuse to process any schema which contains a +format value it doesn't support. [^2]: Assertion behavior is called out very explicitly because it is a departure from previous iterations of this specification. Previously, `format` was an