From 71dd8164bd124da0d1967ab9d56ad71033dda8e9 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Tue, 1 Oct 2024 10:44:29 +1300 Subject: [PATCH 01/18] replace 'IRI' ABNF symbol references with plain language --- jsonschema-core.md | 32 ++++++++++++++++---------------- jsonschema-validation.md | 6 +++--- 2 files changed, 19 insertions(+), 19 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 7ea5cc15..c7faeec6 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -452,8 +452,8 @@ The lexical scope of a keyword is determined by the nested JSON data structure of objects and arrays. The largest such scope is an entire schema document. The smallest scope is a single schema object with no subschemas. -Keywords MAY be defined with a partial value, such as a IRI-reference, which -must be resolved against another value, such as another IRI-reference or a full +Keywords MAY be defined with a partial value, such as a IRI reference, which +must be resolved against another value, such as another IRI reference or a full IRI, which is found through the lexical structure of the JSON document. The `$id`, `$ref`, and `$dynamicRef` core keywords, and the "base" JSON Hyper-Schema keyword, are examples of this sort of behavior. @@ -542,7 +542,7 @@ Identifiers define IRIs for a schema, or affect how such IRIs are resolved in keywords, most notably `$id`. Canonical schema IRIs MUST NOT change while processing an instance, but keywords -that affect IRI-reference resolution MAY have behavior that is only fully +that affect IRI reference resolution MAY have behavior that is only fully determined at runtime. While custom identifier keywords are possible, extension designers should take @@ -898,8 +898,8 @@ To differentiate between schemas in a vast ecosystem, schemas are identified by [IRI](#rfc3987), and can embed references to other schemas by specifying their IRI. -Several keywords can accept a relative [IRI-reference](#rfc3987), or a value -used to construct a relative IRI-reference. For these keywords, it is necessary +Several keywords can accept a relative [IRI reference](#rfc3987), or a value +used to construct a relative IRI reference. For these keywords, it is necessary to establish a base IRI in order to resolve the reference. #### The `$id` Keyword {#id-keyword} @@ -912,10 +912,10 @@ the case of a network-addressable URL, a schema need not be downloadable from its canonical IRI. If present, the value for this keyword MUST be a string, and MUST represent a -valid [IRI-reference](#rfc3987). This IRI-reference SHOULD be normalized, and -MUST resolve to an [absolute-IRI](#rfc3987) (without a fragment). +valid [IRI reference](#rfc3987). This IRI reference SHOULD be normalized, and +MUST resolve to an [absolute IRI](#rfc3987) (without a fragment). -The resulting absolute-IRI serves as the base IRI for relative IRI-references in +The resulting absolute IRI serves as the base IRI for relative IRI references in keywords within the schema resource, in accordance with [RFC 3987 section 6.5](#rfc3987) and [RFC 3986 section 5.1.1](#rfc3986) regarding base IRIs embedded in content. @@ -924,7 +924,7 @@ The presence of `$id` in a subschema indicates that the subschema constitutes a distinct schema resource within a single schema document. Furthermore, in accordance with [RFC 3987 section 6.5](#rfc3987) and [RFC 3986 section 5.1.2](#rfc3986) regarding encapsulating entities, if an `$id` in a subschema is -a relative IRI-reference, the base IRI for resolving that reference is the IRI +a relative IRI reference, the base IRI for resolving that reference is the IRI of the parent schema resource. Note that an `$id` consisting of an empty IRI or of the empty fragment only will result in the embedded resource having the same IRI as the encapsulating resource, which SHOULD be considered an error per @@ -937,7 +937,7 @@ given in the [previous section.](initial-base) ##### Identifying the root schema The root schema of a JSON Schema document SHOULD contain an `$id` keyword with -an [absolute-IRI](#rfc3987) (containing a scheme, but no fragment). +an [absolute IRI](#rfc3987) (containing a scheme, but no fragment). #### Defining location-independent identifiers {#anchors} @@ -971,7 +971,7 @@ If present, the value of these keywords MUST be a string and MUST conform to the plain name fragment identifier syntax defined in {{fragments}}.[^4] [^4]: Note that the anchor string does not include the "#" character, as it is -not a IRI-reference. An `$anchor`: "foo" becomes the fragment `#foo` when used +not a IRI reference. An `$anchor`: "foo" becomes the fragment `#foo` when used in a IRI. See below for full examples. #### Duplicate schema identifiers {#duplicate-iris} @@ -1005,7 +1005,7 @@ identified schema. Its results are the results of the referenced schema.[^5] [^5]: Note that this definition of how the results are determined means that other keywords can appear alongside of `$ref` in the same schema object. -The value of the `$ref` keyword MUST be a string which is a IRI-Reference. +The value of the `$ref` keyword MUST be a string which is a IRI reference. Resolved against the current IRI base, it produces the IRI of the schema to apply. This resolution is safe to perform on schema load, as the process of evaluating an instance cannot change how the reference resolves. @@ -1022,7 +1022,7 @@ reference themselves). The extension point is defined with `$dynamicAnchor` and only exhibits runtime dynamic behavior when referenced with `$dynamicRef`. The value of the `$dynamicRef` property MUST be a string which is a -IRI-Reference that contains a valid [plain name fragment](#anchors). Resolved +IRI reference that contains a valid [plain name fragment](#anchors). Resolved against the current IRI base, it indicates the schema resource used as the starting point for runtime resolution. This initial resolution is safe to perform on schema load. @@ -2284,9 +2284,9 @@ simplify coding so that various invocations of JSON Schema libraries do not have to keep track of and load a large number of resources. This transformation can be safely and reversibly done as long as all static -references (e.g. `$ref`) use IRI-references that resolve to IRIs using the +references (e.g. `$ref`) use IRI references that resolve to IRIs using the canonical resource IRI as the base, and all schema resources have an -absolute-IRI as the `$id` in their root schema. +absolute IRI as the `$id` in their root schema. With these conditions met, each external resource can be copied under `$defs`, without breaking any references among the resources' schema objects, and without @@ -2470,7 +2470,7 @@ to the document. - Clarify that detecting duplicate IRIs for different schemas SHOULD raise an error - Consolidate and clarify the syntax and rationale for plain-name fragments -- "$id" MUST be an absolute-IRI, without any fragment, even an empty one +- "$id" MUST be an absolute IRI, without any fragment, even an empty one - Note that an empty string "$id" results in duplicate IRIs for different schemas - Define empty schemas as empty (no longer allowing unrecognized keywords) diff --git a/jsonschema-validation.md b/jsonschema-validation.md index 2251d9a3..2768cb12 100644 --- a/jsonschema-validation.md +++ b/jsonschema-validation.md @@ -444,7 +444,7 @@ representation of an IP address as follows: [RFC3986](#rfc3986). - *iri:* A string instance is valid against this attribute if it is a valid IRI, according to [RFC3987](#rfc3987). -- *iri-reference:* A string instance is valid against this attribute if it is a +- *IRI reference:* A string instance is valid against this attribute if it is a valid IRI Reference (either an IRI or a relative-reference), according to [RFC3987](#rfc3987). - *uuid:* A string instance is valid against this attribute if it is a valid @@ -563,7 +563,7 @@ The value of this property MUST be a valid JSON schema. It SHOULD be ignored if location IRI included as part of the annotation will ensure that it is correctly processed as a subschema. Using the extracted annotation value directly is only safe if the schema is an embedded resource with both `$schema` and an -absolute-IRI `$id`. +absolute IRI `$id`. ### Example @@ -952,7 +952,7 @@ schema form to the core spec - Restored "regex" format (removal was unintentional) - Added "date" and "time" formats, and reserved additional RFC 3339 format names - - I18N formats: "iri", "iri-reference", "idn-hostname", "idn-email" + - I18N formats: "iri", "IRI reference", "idn-hostname", "idn-email" - Clarify that "json-pointer" format means string encoding, not URI fragment - Fixed typo that inverted the meaning of `minimum` and `exclusiveMinimum` - Move format syntax references into Normative References From 16c475a81a80059ced3b848aacdc47c842453d0e Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Tue, 1 Oct 2024 11:21:03 +1300 Subject: [PATCH 02/18] resolves #1349 - add explicit pointer to IRI normalization process --- jsonschema-core.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index c7faeec6..b0018bf5 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -912,8 +912,9 @@ the case of a network-addressable URL, a schema need not be downloadable from its canonical IRI. If present, the value for this keyword MUST be a string, and MUST represent a -valid [IRI reference](#rfc3987). This IRI reference SHOULD be normalized, and -MUST resolve to an [absolute IRI](#rfc3987) (without a fragment). +valid [IRI reference](#rfc3987). This IRI reference SHOULD be normalized per RFC +3987, section 5.3, and MUST resolve to an [absolute IRI](#rfc3987) (without a +fragment). The resulting absolute IRI serves as the base IRI for relative IRI references in keywords within the schema resource, in accordance with [RFC 3987 section From 64fbf863e0530bd59fdeb595bb9f3a8752696060 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Tue, 1 Oct 2024 13:19:06 +1300 Subject: [PATCH 03/18] pointers across resource boundary is undefined --- jsonschema-core.md | 56 +++++++++++++++++++++++++--------------------- 1 file changed, 31 insertions(+), 25 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index b0018bf5..a2754809 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -933,7 +933,7 @@ IRI as the encapsulating resource, which SHOULD be considered an error per If no parent schema object explicitly identifies itself as a resource with `$id`, the base IRI is that of the entire document, as established by the steps -given in the [previous section.](initial-base) +given in {{initial-base}}. ##### Identifying the root schema @@ -1191,15 +1191,9 @@ automatically. When an implementation encounters the reference to "other.json", it resolves this to `https://example.net/other.json`, which is not defined in this document. -If a schema with that identifier has otherwise been supplied to the -implementation, it can also be used automatically.[^7] - -[^7]: What should implementations do when the referenced schema is not known? -Are there circumstances in which automatic network dereferencing is allowed? A -same origin policy? A user-configurable option? In the case of an evolving API -described by Hyper-Schema, it is expected that new schemas will be added to the -system dynamically, so placing an absolute requirement of pre-loading schema -documents is not feasible. +If an implementation has been configured to resolve that identifier to a schema +via pre-loading or other means, it can be used automatically; otherwise, the +behavior described in {{failed-refs}} MUST be used. #### JSON Pointer fragments and embedded schema resources {#embedded} @@ -1272,10 +1266,10 @@ the `$id` of the embedded or referenced resource unless it is specifically desired to identify the object containing the `$ref` in the second (non-embedded) arrangement. -An implementation MAY choose not to support addressing schema resource contents -by IRIs using a base other than the resource's canonical IRI, plus a JSON -Pointer fragment relative to that base. Therefore, schema authors SHOULD NOT -rely on such IRIs, as using them may reduce interoperability.[^8] +Due to the potential break in functionality described above, the behavior for +using JSON Pointer fragments that point to or cross a resource boundary is +undefined. Schema authors SHOULD NOT rely on such IRIs, as using them may +reduce interoperability. [^8]: This is to avoid requiring implementations to keep track of a whole stack of possible base IRIs and JSON Pointer fragments for each, given that all but @@ -1408,7 +1402,7 @@ behave correctly under implementations that attempt to use any reference target as a schema. However, this behavior is implementation-specific and MUST NOT be relied upon for interoperability. -#### Failure to resolve references +#### Failure to resolve references {#failed-refs} If for any reason a reference cannot be resolved, the evaluation MUST halt and return an indeterminant result. Specifically, it MUST NOT return a passing or @@ -2231,32 +2225,22 @@ listed IRI in accordance with {{fragments}} and {{embedded}} above. `#/$defs/B`: canonical (and base) `IRI: https://example.com/other.json` - canonical resource IRI plus pointer fragment: `https://example.com/other.json#` -- base IRI of enclosing (root.json) resource plus fragment: - `https://example.com/root.json#/$defs/B` `#/$defs/B/$defs/X`: base IRI: `https://example.com/other.json` - canonical resource IRI plus plain fragment: `https://example.com/other.json#bar` - canonical resource IRI plus pointer fragment: `https://example.com/other.json#/$defs/X` -- base IRI of enclosing (root.json) resource plus fragment: - `https://example.com/root.json#/$defs/B/$defs/X` `#/$defs/B/$defs/Y`: canonical (and base) IRI: `https://example.com/t/inner.json` - canonical IRI plus plain fragment: `https://example.com/t/inner.json#bar` - canonical IRI plus pointer fragment: `https://example.com/t/inner.json#` -- base IRI of enclosing (other.json) resource plus fragment: - `https://example.com/other.json#/$defs/Y` -- base IRI of enclosing (root.json) resource plus fragment: - `https://example.com/root.json#/$defs/B/$defs/Y` `#/$defs/C`: canonical (and base) IRI: `urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f` - canonical IRI plus pointer fragment: `urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f#` -- base IRI of enclosing (root.json) resource plus fragment: - `https://example.com/root.json#/$defs/C` Note: The fragment part of the IRI does not make it canonical or non-canonical, rather, the base IRI used (as part of the full IRI with any fragment) is what @@ -2266,6 +2250,28 @@ determines the canonical nature of the resulting full IRI.[^18] and direct you to read the CREF located in the [JSON Pointer fragments and embedded schema resources](#embedded) section for further comments. +While the following IRIs do correctly indicate specific schemas, per the reasons outlined in {{embedded}}, they are to be avoided: + +`#/$defs/B`: canonical (and base) `IRI: https://example.com/other.json` +- base IRI of enclosing (root.json) resource plus fragment: + `https://example.com/root.json#/$defs/B` + +`#/$defs/B/$defs/X`: base IRI: `https://example.com/other.json` +- base IRI of enclosing (root.json) resource plus fragment: + `https://example.com/root.json#/$defs/B/$defs/X` + +`#/$defs/B/$defs/Y`: canonical (and base) IRI: +`https://example.com/t/inner.json` +- base IRI of enclosing (other.json) resource plus fragment: + `https://example.com/other.json#/$defs/Y` +- base IRI of enclosing (root.json) resource plus fragment: + `https://example.com/root.json#/$defs/B/$defs/Y` + +`#/$defs/C`: canonical (and base) IRI: +`urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f` +- base IRI of enclosing (root.json) resource plus fragment: + `https://example.com/root.json#/$defs/C` + ## [Appendix] Manipulating schema documents and references Various tools have been created to rearrange schema documents based on how and From a38117ac8609e7cad9e0ab7480490b9d401dbc54 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Tue, 1 Oct 2024 13:27:56 +1300 Subject: [PATCH 04/18] remove paragraphs about ref-ing into unknown keywords --- jsonschema-core.md | 17 +---------------- 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index a2754809..f1b2f2e7 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -1313,7 +1313,7 @@ When the Schema Resource referenced by a by-reference applicator is bundled, it is RECOMMENDED that the Schema Resource be located as a value of a `$defs` object at the containing schema's root. The key of the `$defs` for the now embedded Schema Resource MAY be the `$id` of the bundled schema or some other -form of application defined unique identifer (such as a UUID). This key is not +form of application defined unique identifier (such as a UUID). This key is not intended to be referenced in JSON Schema, but may be used by an application to aid the bundling process. @@ -1381,21 +1381,6 @@ applicator keywords or with location-reserving keywords such as be `$defs` and the standard applicators from this document or implementation-specific custom keywords. -Multi-level structures of unknown keywords are capable of introducing nested -subschemas, which would be subject to the processing rules for `$id`. Therefore, -having a reference target in such an unrecognized structure cannot be reliably -implemented, and the resulting behavior is undefined. Similarly, a reference -target under a known keyword, for which the value is known not to be a schema, -results in undefined behavior in order to avoid burdening implementations with -the need to detect such targets.[^10] - -[^10]: These scenarios are analogous to fetching a schema over HTTP but -receiving a response with a Content-Type other than `application/schema+json`. -An implementation can certainly try to interpret it as a schema, but the origin -server offered no guarantee that it actually is any such thing. Therefore, -interpreting it as such has security implication and may produce unpredictable -results. - Note that single-level custom keywords with identical syntax and semantics to `$defs` do not allow for any intervening `$id` keywords, and therefore will behave correctly under implementations that attempt to use any reference target From e9aed4ae88268b41e425dd9733d52167326513fe Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Tue, 1 Oct 2024 13:33:11 +1300 Subject: [PATCH 05/18] fix line wrapping --- jsonschema-core.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index f1b2f2e7..47602947 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -2235,7 +2235,8 @@ determines the canonical nature of the resulting full IRI.[^18] and direct you to read the CREF located in the [JSON Pointer fragments and embedded schema resources](#embedded) section for further comments. -While the following IRIs do correctly indicate specific schemas, per the reasons outlined in {{embedded}}, they are to be avoided: +While the following IRIs do correctly indicate specific schemas, per the reasons +outlined in {{embedded}}, they are to be avoided: `#/$defs/B`: canonical (and base) `IRI: https://example.com/other.json` - base IRI of enclosing (root.json) resource plus fragment: From 1d61c8737d5d00bf6f333d38dde3462710ecce65 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Wed, 2 Oct 2024 21:04:10 +1300 Subject: [PATCH 06/18] add comment ref back; update appendix format per PR discussion --- jsonschema-core.md | 42 ++++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 16 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 47602947..51e7a7df 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -1269,7 +1269,7 @@ desired to identify the object containing the `$ref` in the second Due to the potential break in functionality described above, the behavior for using JSON Pointer fragments that point to or cross a resource boundary is undefined. Schema authors SHOULD NOT rely on such IRIs, as using them may -reduce interoperability. +reduce interoperability.[^8] [^8]: This is to avoid requiring implementations to keep track of a whole stack of possible base IRIs and JSON Pointer fragments for each, given that all but @@ -2194,35 +2194,42 @@ name fragment identifiers. } ``` -The schemas at the following IRI-encoded [JSON Pointers](#rfc6901) (relative to -the root schema) have the following base IRIs, and are identifiable by any -listed IRI in accordance with {{fragments}} and {{embedded}} above. +The schemas at the following locations (indicated by plain +[JSON Pointers](#rfc6901) relative to the root document) have the following base +IRIs, and are identifiable by any listed IRI in accordance with {{fragments}} +and {{embedded}} above. -`#` (document root): canonical (and base) IRI: `https://example.com/root.json` +Document root: +- canonical (and base) IRI: `https://example.com/root.json` - canonical resource IRI plus pointer fragment: `https://example.com/root.json#` -`#/$defs/A`: base IRI: `https://example.com/root.json` +Document location `/$defs/A`: +- base IRI: `https://example.com/root.json` - canonical resource IRI plus plain fragment: `https://example.com/root.json#foo` - canonical resource IRI plus pointer fragment: `https://example.com/root.json#/$defs/A` -`#/$defs/B`: canonical (and base) `IRI: https://example.com/other.json` +Document location `/$defs/B`: +- canonical (and base) `IRI: https://example.com/other.json` - canonical resource IRI plus pointer fragment: `https://example.com/other.json#` -`#/$defs/B/$defs/X`: base IRI: `https://example.com/other.json` +Document location `/$defs/B/$defs/X`: +- base IRI: `https://example.com/other.json` - canonical resource IRI plus plain fragment: `https://example.com/other.json#bar` - canonical resource IRI plus pointer fragment: `https://example.com/other.json#/$defs/X` -`#/$defs/B/$defs/Y`: canonical (and base) IRI: +Document location `/$defs/B/$defs/Y`: +- canonical (and base) IRI: `https://example.com/t/inner.json` - canonical IRI plus plain fragment: `https://example.com/t/inner.json#bar` - canonical IRI plus pointer fragment: `https://example.com/t/inner.json#` -`#/$defs/C`: canonical (and base) IRI: +Document location `/$defs/C`: +- canonical (and base) IRI: `urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f` - canonical IRI plus pointer fragment: `urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f#` @@ -2232,28 +2239,31 @@ rather, the base IRI used (as part of the full IRI with any fragment) is what determines the canonical nature of the resulting full IRI.[^18] [^18]: Multiple "canonical" IRIs? We Acknowledge this is potentially confusing, -and direct you to read the CREF located in the [JSON Pointer fragments and -embedded schema resources](#embedded) section for further comments. +and direct you to read the CREF located in {{#embedded}} for further comments. While the following IRIs do correctly indicate specific schemas, per the reasons outlined in {{embedded}}, they are to be avoided: -`#/$defs/B`: canonical (and base) `IRI: https://example.com/other.json` +Document location `/$defs/B`: +- canonical (and base) `IRI: https://example.com/other.json` - base IRI of enclosing (root.json) resource plus fragment: `https://example.com/root.json#/$defs/B` -`#/$defs/B/$defs/X`: base IRI: `https://example.com/other.json` +Document location `/$defs/B/$defs/X`: +- base IRI: `https://example.com/other.json` - base IRI of enclosing (root.json) resource plus fragment: `https://example.com/root.json#/$defs/B/$defs/X` -`#/$defs/B/$defs/Y`: canonical (and base) IRI: +Document location `/$defs/B/$defs/Y`: +- canonical (and base) IRI: `https://example.com/t/inner.json` - base IRI of enclosing (other.json) resource plus fragment: `https://example.com/other.json#/$defs/Y` - base IRI of enclosing (root.json) resource plus fragment: `https://example.com/root.json#/$defs/B/$defs/Y` -`#/$defs/C`: canonical (and base) IRI: +Document location `/$defs/C`: +- canonical (and base) IRI: `urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f` - base IRI of enclosing (root.json) resource plus fragment: `https://example.com/root.json#/$defs/C` From 4392b2598af53bcfaabe0fe19f1d466fafd698a4 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Sat, 12 Oct 2024 08:26:32 +1300 Subject: [PATCH 07/18] reorganize the $id section and remove redundancies --- jsonschema-core.md | 50 +++++++++++++++++++++++++--------------------- 1 file changed, 27 insertions(+), 23 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 51e7a7df..13803955 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -895,8 +895,10 @@ by other parties. ### Base IRI, Anchors, and Dereferencing To differentiate between schemas in a vast ecosystem, schemas are identified by -[IRI](#rfc3987), and can embed references to other schemas by specifying their -IRI. +[absolute IRIs](#rfc3987) (without fragments), and can embed references to other +schemas by specifying their IRI. When comparing IRIs, implementations SHOULD +interpret them using the normalization procedures defined in +[RFC 3987](#rfc3987), section 5.3. Several keywords can accept a relative [IRI reference](#rfc3987), or a value used to construct a relative IRI reference. For these keywords, it is necessary @@ -904,32 +906,23 @@ to establish a base IRI in order to resolve the reference. #### The `$id` Keyword {#id-keyword} -The `$id` keyword identifies a schema resource with its [canonical](#rfc6596) -IRI. +The `$id` keyword identifies a schema resource. The value for this keyword MUST +be a string, and MUST represent a valid [IRI reference](#rfc3987) (without a +fragment). + +When the value of this keyword is resolved against the current base IRI, the +resulting absolute IRI then serves as the identifier for the schema resource and +as a base IRI for relative IRI references in keywords within that schema +resource, in accordance with [RFC 3987 section 6.5](#rfc3987) and +[RFC 3986 section 5.1.1](#rfc3986) regarding base IRIs embedded in content. Note that this IRI is an identifier and not necessarily a network locator. In the case of a network-addressable URL, a schema need not be downloadable from its canonical IRI. -If present, the value for this keyword MUST be a string, and MUST represent a -valid [IRI reference](#rfc3987). This IRI reference SHOULD be normalized per RFC -3987, section 5.3, and MUST resolve to an [absolute IRI](#rfc3987) (without a -fragment). - -The resulting absolute IRI serves as the base IRI for relative IRI references in -keywords within the schema resource, in accordance with [RFC 3987 section -6.5](#rfc3987) and [RFC 3986 section 5.1.1](#rfc3986) regarding base IRIs -embedded in content. - -The presence of `$id` in a subschema indicates that the subschema constitutes a -distinct schema resource within a single schema document. Furthermore, in -accordance with [RFC 3987 section 6.5](#rfc3987) and [RFC 3986 section -5.1.2](#rfc3986) regarding encapsulating entities, if an `$id` in a subschema is -a relative IRI reference, the base IRI for resolving that reference is the IRI -of the parent schema resource. Note that an `$id` consisting of an empty IRI or -of the empty fragment only will result in the embedded resource having the same -IRI as the encapsulating resource, which SHOULD be considered an error per -{{duplicate-iris}}. +Also note that an `$id` consisting of an empty IRI only will result in the +embedded resource having the same IRI as the encapsulating resource, which +SHOULD be considered an error per {{duplicate-iris}}. If no parent schema object explicitly identifies itself as a resource with `$id`, the base IRI is that of the entire document, as established by the steps @@ -1387,6 +1380,17 @@ behave correctly under implementations that attempt to use any reference target as a schema. However, this behavior is implementation-specific and MUST NOT be relied upon for interoperability. +A reference target under a keyword for which the value is known not to be a +schema results in undefined behavior in order to avoid burdening implementations +with the need to detect such targets.[^10] + +[^10]: These scenarios are analogous to fetching a schema over HTTP but +receiving a response with a Content-Type other than `application/schema+json`. +An implementation can certainly try to interpret it as a schema, but the origin +server offered no guarantee that it actually is any such thing. Therefore, +interpreting it as such has security implication and may produce unpredictable +results. + #### Failure to resolve references {#failed-refs} If for any reason a reference cannot be resolved, the evaluation MUST halt and From 43c8889e599345cd718c62a863d421f84e09fcd6 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Thu, 3 Oct 2024 08:24:33 +1300 Subject: [PATCH 08/18] Apply suggestions from code review Co-authored-by: Jason Desrosiers --- jsonschema-validation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/jsonschema-validation.md b/jsonschema-validation.md index 2768cb12..4a5ca2d1 100644 --- a/jsonschema-validation.md +++ b/jsonschema-validation.md @@ -444,7 +444,7 @@ representation of an IP address as follows: [RFC3986](#rfc3986). - *iri:* A string instance is valid against this attribute if it is a valid IRI, according to [RFC3987](#rfc3987). -- *IRI reference:* A string instance is valid against this attribute if it is a +- *iri-reference:* A string instance is valid against this attribute if it is a valid IRI Reference (either an IRI or a relative-reference), according to [RFC3987](#rfc3987). - *uuid:* A string instance is valid against this attribute if it is a valid @@ -952,7 +952,7 @@ schema form to the core spec - Restored "regex" format (removal was unintentional) - Added "date" and "time" formats, and reserved additional RFC 3339 format names - - I18N formats: "iri", "IRI reference", "idn-hostname", "idn-email" + - I18N formats: "iri", "iri-reference", "idn-hostname", "idn-email" - Clarify that "json-pointer" format means string encoding, not URI fragment - Fixed typo that inverted the meaning of `minimum` and `exclusiveMinimum` - Move format syntax references into Normative References From 39c0e186ed728eb325d7c414335f9bfb1c253bec Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Fri, 4 Oct 2024 09:56:02 +1300 Subject: [PATCH 09/18] Update jsonschema-core.md --- jsonschema-core.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 13803955..793159a1 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -2246,7 +2246,7 @@ determines the canonical nature of the resulting full IRI.[^18] and direct you to read the CREF located in {{#embedded}} for further comments. While the following IRIs do correctly indicate specific schemas, per the reasons -outlined in {{embedded}}, they are to be avoided: +outlined in {{embedded}}, they are to be avoided as they may not work in all implementations: Document location `/$defs/B`: - canonical (and base) `IRI: https://example.com/other.json` From 43c260ca8c79c87374e695241fcc04b9d7bfa893 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Sat, 12 Oct 2024 08:45:07 +1300 Subject: [PATCH 10/18] some more clarification on when implementations should normalize IRIs --- jsonschema-core.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 793159a1..1f7adca7 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -895,9 +895,9 @@ by other parties. ### Base IRI, Anchors, and Dereferencing To differentiate between schemas in a vast ecosystem, schemas are identified by -[absolute IRIs](#rfc3987) (without fragments), and can embed references to other -schemas by specifying their IRI. When comparing IRIs, implementations SHOULD -interpret them using the normalization procedures defined in +[absolute IRIs](#rfc3987) (without fragments) and can embed references to other +schemas by specifying their respective IRIs. When comparing IRIs, +implementations SHOULD first follow the IRI normalization procedures defined in [RFC 3987](#rfc3987), section 5.3. Several keywords can accept a relative [IRI reference](#rfc3987), or a value From 878517d2052b419fdb2b053e4613f6592c7ac9a9 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Sat, 12 Oct 2024 08:46:32 +1300 Subject: [PATCH 11/18] more clarity --- jsonschema-core.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 1f7adca7..cf855640 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -896,9 +896,9 @@ by other parties. To differentiate between schemas in a vast ecosystem, schemas are identified by [absolute IRIs](#rfc3987) (without fragments) and can embed references to other -schemas by specifying their respective IRIs. When comparing IRIs, -implementations SHOULD first follow the IRI normalization procedures defined in -[RFC 3987](#rfc3987), section 5.3. +schemas by specifying their respective IRIs. When comparing IRIs for the +purposes of resource identification, implementations SHOULD first follow the IRI +normalization procedures defined in [RFC 3987](#rfc3987), section 5.3. Several keywords can accept a relative [IRI reference](#rfc3987), or a value used to construct a relative IRI reference. For these keywords, it is necessary From 636aa373d6ab8a5bbda3ffb8ee31a6bf8cc3ad7d Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Mon, 14 Oct 2024 07:47:08 +1300 Subject: [PATCH 12/18] add that the base IRI is also a base for nested resources --- jsonschema-core.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index cf855640..3a7ad536 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -913,8 +913,9 @@ fragment). When the value of this keyword is resolved against the current base IRI, the resulting absolute IRI then serves as the identifier for the schema resource and as a base IRI for relative IRI references in keywords within that schema -resource, in accordance with [RFC 3987 section 6.5](#rfc3987) and -[RFC 3986 section 5.1.1](#rfc3986) regarding base IRIs embedded in content. +resource and for nested schema resources, in accordance with [RFC 3987 section +6.5](#rfc3987) and [RFC 3986 section 5.1.1](#rfc3986) regarding base IRIs +embedded in content. Note that this IRI is an identifier and not necessarily a network locator. In the case of a network-addressable URL, a schema need not be downloadable from From 8407415d21339caa51938b2a2a391f48f7ca2851 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Tue, 22 Oct 2024 07:58:31 +1300 Subject: [PATCH 13/18] Apply suggestions from code review Co-authored-by: Jason Desrosiers --- jsonschema-core.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 3a7ad536..2dd06fb7 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -907,8 +907,8 @@ to establish a base IRI in order to resolve the reference. #### The `$id` Keyword {#id-keyword} The `$id` keyword identifies a schema resource. The value for this keyword MUST -be a string, and MUST represent a valid [IRI reference](#rfc3987) (without a -fragment). +be a string, and MUST represent a valid [IRI reference](#rfc3987) without a +fragment. When the value of this keyword is resolved against the current base IRI, the resulting absolute IRI then serves as the identifier for the schema resource and @@ -1262,7 +1262,7 @@ desired to identify the object containing the `$ref` in the second Due to the potential break in functionality described above, the behavior for using JSON Pointer fragments that point to or cross a resource boundary is -undefined. Schema authors SHOULD NOT rely on such IRIs, as using them may +undefined. Schema authors SHOULD NOT rely on such IRIs, as using them may reduce interoperability.[^8] [^8]: This is to avoid requiring implementations to keep track of a whole stack From cd52a3e3a5fc5de39ec3cb709173defc5f97614c Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Tue, 22 Oct 2024 18:43:18 +1300 Subject: [PATCH 14/18] fix incorrect anchor reference --- jsonschema-core.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 2dd06fb7..ce3c365c 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -2244,7 +2244,7 @@ rather, the base IRI used (as part of the full IRI with any fragment) is what determines the canonical nature of the resulting full IRI.[^18] [^18]: Multiple "canonical" IRIs? We Acknowledge this is potentially confusing, -and direct you to read the CREF located in {{#embedded}} for further comments. +and direct you to read the CREF located in {{embedded}} for further comments. While the following IRIs do correctly indicate specific schemas, per the reasons outlined in {{embedded}}, they are to be avoided as they may not work in all implementations: From 9a8ae1470629904f0f1d9f6be2c42e8400ea50ca Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Wed, 23 Oct 2024 15:21:43 +1300 Subject: [PATCH 15/18] update general summary about identifiers --- jsonschema-core.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 2dd06fb7..57fda6aa 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -894,11 +894,11 @@ by other parties. ### Base IRI, Anchors, and Dereferencing -To differentiate between schemas in a vast ecosystem, schemas are identified by -[absolute IRIs](#rfc3987) (without fragments) and can embed references to other -schemas by specifying their respective IRIs. When comparing IRIs for the -purposes of resource identification, implementations SHOULD first follow the IRI -normalization procedures defined in [RFC 3987](#rfc3987), section 5.3. +To differentiate between schemas in a vast ecosystem, schema resources are +identified by [absolute IRIs](#rfc3987) (without fragments). These identifiers +are used to created references between schema resources. When comparing IRIs for +the purposes of resource identification, implementations SHOULD first follow the +IRI normalization procedures defined in [RFC 3987](#rfc3987), section 5.3. Several keywords can accept a relative [IRI reference](#rfc3987), or a value used to construct a relative IRI reference. For these keywords, it is necessary From 4d6fdb50e078620c6fb962f6c5444e936d0349b6 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Thu, 24 Oct 2024 07:43:58 +1300 Subject: [PATCH 16/18] Update jsonschema-core.md Co-authored-by: Jason Desrosiers --- jsonschema-core.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index e11c6dd9..660f5ccf 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -895,8 +895,8 @@ by other parties. ### Base IRI, Anchors, and Dereferencing To differentiate between schemas in a vast ecosystem, schema resources are -identified by [absolute IRIs](#rfc3987) (without fragments). These identifiers -are used to created references between schema resources. When comparing IRIs for +identified by [absolute IRIs](#rfc3987) (without fragments). These identifiers +are used to create references between schema resources. When comparing IRIs for the purposes of resource identification, implementations SHOULD first follow the IRI normalization procedures defined in [RFC 3987](#rfc3987), section 5.3. From 7f04f62d7ba534c5e9c28f5e301d32463bdcbfa1 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Fri, 25 Oct 2024 20:25:45 +1300 Subject: [PATCH 17/18] further edits for $id and referencing unknown location --- jsonschema-core.md | 31 +++++++++++++------------------ 1 file changed, 13 insertions(+), 18 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 660f5ccf..a5cfa268 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -906,14 +906,14 @@ to establish a base IRI in order to resolve the reference. #### The `$id` Keyword {#id-keyword} -The `$id` keyword identifies a schema resource. The value for this keyword MUST -be a string, and MUST represent a valid [IRI reference](#rfc3987) without a -fragment. +An `$id` keyword in a schema or subschema identifies that schema or subschema as +a distinct schema resource. The value for this keyword MUST be a string, and +MUST represent a valid [IRI reference](#rfc3987) without a fragment. When the value of this keyword is resolved against the current base IRI, the resulting absolute IRI then serves as the identifier for the schema resource and as a base IRI for relative IRI references in keywords within that schema -resource and for nested schema resources, in accordance with [RFC 3987 section +resource and for embedded schema resources, in accordance with [RFC 3987 section 6.5](#rfc3987) and [RFC 3986 section 5.1.1](#rfc3986) regarding base IRIs embedded in content. @@ -1370,20 +1370,15 @@ recursive nesting like this; the behavior is undefined. #### References to Possible Non-Schemas {#non-schemas} Subschema objects (or booleans) are recognized by their use with known -applicator keywords or with location-reserving keywords such as -[`$defs`](#defs) that take one or more subschemas as a value. These keywords may -be `$defs` and the standard applicators from this document or -implementation-specific custom keywords. - -Note that single-level custom keywords with identical syntax and semantics to -`$defs` do not allow for any intervening `$id` keywords, and therefore will -behave correctly under implementations that attempt to use any reference target -as a schema. However, this behavior is implementation-specific and MUST NOT be -relied upon for interoperability. - -A reference target under a keyword for which the value is known not to be a -schema results in undefined behavior in order to avoid burdening implementations -with the need to detect such targets.[^10] +applicator keywords or with location-reserving keywords, such as +[`$defs`](#defs), that take one or more subschemas as a value. These keywords +include the standard applicators from this document or implementation-specific +custom keywords. + +A reference target under a keyword for which the value is not explicitly known +to be a schema results in undefined behavior. Implementations MAY support +references to these locations, however such behavior is not considered +interoperable and should not be relied upon.[^10] [^10]: These scenarios are analogous to fetching a schema over HTTP but receiving a response with a Content-Type other than `application/schema+json`. From b64d297fb1fbd61bfbb29bec379dfcbe5a55b49d Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Tue, 29 Oct 2024 21:19:55 +1300 Subject: [PATCH 18/18] add reference to 3986 sec. 5.1.2 --- jsonschema-core.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index a5cfa268..d9b6c2ef 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -915,7 +915,7 @@ resulting absolute IRI then serves as the identifier for the schema resource and as a base IRI for relative IRI references in keywords within that schema resource and for embedded schema resources, in accordance with [RFC 3987 section 6.5](#rfc3987) and [RFC 3986 section 5.1.1](#rfc3986) regarding base IRIs -embedded in content. +embedded in content and RFC 3986 section 5.1.2 regarding encapsulating entities. Note that this IRI is an identifier and not necessarily a network locator. In the case of a network-addressable URL, a schema need not be downloadable from