Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify dset/attr builders based on sidecar JSON #677

Draft
wants to merge 27 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
9e4ba60
Add first at reading sidecar modifications
rly Nov 11, 2021
1f53919
Pretty-print json
rly Nov 11, 2021
dafc650
Update to work if json is not present
rly Nov 11, 2021
de5fefe
Refactor BuilderUpdater functionality to sep class
rly Nov 11, 2021
3f1f8f2
Merge branch 'dev' into sidecar_mods
rly Nov 30, 2021
036fa1e
Handle changing sub-dataset attr, add sidecar fields
rly Nov 30, 2021
b4b5419
Use semantic versioning in version label
rly Nov 30, 2021
151c69d
Add jsonschema for sidecar json
rly Dec 1, 2021
32d1397
Add validation to read
rly Dec 1, 2021
933ef40
Update to use new schema. More tests needed
rly Dec 7, 2021
393e5b3
Update tests (more to do)
rly Dec 8, 2021
2fda06d
Add description, author, and contact to sidecar JSON, fix tests
rly Dec 8, 2021
28c6893
Merge branch 'dev' into sidecar_mods
rly Jan 25, 2022
6da168d
Merge branch 'dev' into sidecar_mods
rly Apr 11, 2022
618ab1c
Merge branch 'dev' of https://github.com/hdmf-dev/hdmf into sidecar_mods
rly Apr 11, 2022
393ffdf
Merge branch 'sidecar_mods' of https://github.com/hdmf-dev/hdmf into …
rly Apr 11, 2022
729e989
Update documentation, refactor, and add test cases
rly Apr 12, 2022
ecd244d
Update
rly Apr 12, 2022
168f4a9
Add link to sidecar json schema
rly Apr 12, 2022
1c57573
Add examples to doc
rly Apr 12, 2022
62ed248
Update sidecar.rst
rly Apr 12, 2022
7078ca1
Merge branch 'dev' into sidecar_mods
rly Apr 21, 2022
9faf7a2
Update sidecar.rst
rly Apr 21, 2022
827d61d
Update docs/source/sidecar.rst
rly Apr 21, 2022
ef22dc5
Update sidecar.rst
rly Apr 21, 2022
2bb7185
Merge branch 'dev' into sidecar_mods
rly Aug 31, 2022
fee5245
Merge branch 'dev' into sidecar_mods
rly Nov 29, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ If you use HDMF in your research, please use the following citation:
building_api
export
validation
sidecar

.. toctree::
:hidden:
Expand Down
154 changes: 154 additions & 0 deletions docs/source/sidecar.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
.. _modifying_with_sidecar:

Modifying an HDMF File with a Sidecar JSON File
===============================================

Users may want to update part of an HDMF file without rewriting the entire file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users may want to update part of an HDMF file without rewriting the entire file.
Users may want to update part of an HDMF file without rewriting the entire file.

I think it would be useful to elaborate a little bit on this to clarify the intent and scope of the sidecar file, i.e., this is for small updates and corrections only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users may want to update part of an HDMF file without rewriting the entire file.
Users may want to update part of an HDMF file without rewriting the entire file.

I think it would be useful to elaborate a little bit on this to clarify the intent and scope of the sidecar file, i.e., this is for small updates and corrections only.

To do so, HDMF supports the use of a "sidecar" JSON file that lives adjacent to the HDMF file on disk and
specifies modifications to the HDMF file. Only a limited set of modifications are supported; for example, users can
delete a dataset or attribute but cannot create a new dataset or attribute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
delete a dataset or attribute but cannot create a new dataset or attribute.
hide a dataset or attribute so that it will not be read by HDFM but cannot create a new dataset or attribute.

I think delete is misleading since we are not actually deleting any data from a file but the JSON file can only indicate that the dataset/attribute should be ignored on read (maybe hide or invalid would be more precise).

Does delete also apply to groups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll make the change. For now, I have not allowed hiding of groups because the use case is unclear. But it is technically not very different from hiding of datasets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a main use-case for hiding groups would instances of a data_type, e.g., to hide a TimeSeries that for some reason contains bad data. If it's trivial, then I think allowing to hide groups is something we could allow, but if it adds a lot of complexity then I would hold off until a specific need arises.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
delete a dataset or attribute but cannot create a new dataset or attribute.
hide a dataset or attribute so that it will not be read by HDFM but cannot create a new dataset or attribute.

I think delete is misleading since we are not actually deleting any data from a file but the JSON file can only indicate that the dataset/attribute should be ignored on read (maybe hide or invalid would be more precise).

Does delete also apply to groups?

When HDMF reads an HDMF file, if the corresponding sidecar JSON file exists, it is
automatically read and the modifications that it specifies are automatically applied.

.. note::

This default behavior can be changed such that the corresponding sidecar JSON file is ignored when the HDMF file
is read by passing ``load_sidecar=False`` to ``HDMFIO.read()`` on the ``HDMFIO`` object used to read the HDMF file.

Allowed modifications
---------------------

Only the following modifications to an HDMF file are supported in the sidecar JSON file:

- Replace the values of a dataset or attribute with a scalar or 1-D array
- Delete a dataset or attribute

.. note::

Replacing the values of a dataset or attribute with a very large 1-D array using the sidecar JSON file may not
be efficient and is discouraged. Users should instead consider rewriting the HDMF file with the
updated values.

Specification for the sidecar JSON file
---------------------------------------

The sidecar JSON file can be validated using the ``sidecar.schema.json`` JSON schema file
located at the root of the HDMF repository.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are sidecar files automatically validated by the validator as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are sidecar files automatically validated by the validator as well?


The sidecar JSON file must contain the following top-level keys:

- ``"description"``: A free-form string describing the modifications specified in this file.
- ``"author"``: A list of free-form strings containing the names of the people who created this file.
- ``"contact"``: A list of email addresses for the people who created this file. Each author listed in the "author" key
*should* have a corresponding email address.
- ``"operations"``: A list of operations to perform on the data in the file, as specified below.
- ``"schema_version"``: The version of the sidecar JSON schema that the file conforms to, e.g., "0.1.0".
View the current version of this file here:
`sidecar.schema.json <https://github.com/hdmf-dev/hdmf/blob/dev/sidecar.schema.json>`_

Here is an example sidecar JSON file:

.. code:: javascript

{
"description": "Summary of changes",
"author": [
"The NWB Team"
],
"contact": [
"[email protected]"
],
"operations": [
{
"type": "replace",
"description": "change foo1/my_data data from [1, 2, 3] to [4, 5] (int8)",
"object_id": "e0449bb5-2b53-48c1-b04e-85a9a4631655",
"relative_path": "my_data",
"value": [
4,
5
],
"dtype": "int8"
},
{
"type": "delete",
"description": "delete foo1/foo_holder/my_sub_data/attr6",
"object_id": "993fef27-680c-457a-af4d-b1d2725fcca9",
"relative_path": "foo_holder/my_sub_data/attr6"
}
],
"schema_version": "0.1.0"
}

Specification for operations
----------------------------

All operations are required to have the following keys:

- ``"type"``: The type of modification to perform. Only "replace" and "delete" are supported currently.
- ``"description"``: A description of the specified modification.
- ``"object_id"``: The object ID (UUID) of the data type that is closest in the file hierarchy to the
field being modified.
- ``"relative_path"``: The relative path from the data type with the given object ID to the field being modified.

Operations can result in invalid files, i.e., files that do not conform to the specification. It is strongly
recommended that the file is validated against the schema after loading the sidecar JSON. In some cases, the
file cannot be read because the file is invalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Operations can result in invalid files, i.e., files that do not conform to the specification. It is strongly
recommended that the file is validated against the schema after loading the sidecar JSON. In some cases, the
file cannot be read because the file is invalid.
.. warning:
Modifying a file via a sidecar file can result in files that are no longer compliant with the format
specification of the file. E.g., we may ``delete`` a required dataset via a sidecar operation, resulting
in an invalid file that in the worst case, may longer be readable because required arguments are missing.
It is strongly recommended that the file is validated against the schema after loading the sidecar JSON.

rly marked this conversation as resolved.
Show resolved Hide resolved

Replacing values of a dataset/attribute with a scalar or 1-D array
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Specify ``"type": "replace"`` to replace the values of a dataset/attribute from the associated HDMF file
as specified by the ``object_id`` and ``relative_path``.

The operation specification must have the following keys:

- ``"value"``: The new value for the dataset/attribute. Only scalar and 1-dimensional arrays can be
specified as a replacement value.

The operation specification may also have the following keys:

- ``"dtype"``: String representing the dtype of the new value. If this key is not present, then the dtype of the
existing value for the dataset/attribute is used. Allowed dtypes are listed in the
`HDMF schema language docs for dtype <https://hdmf-schema-language.readthedocs.io/en/latest/description.html#dtype>`_.

In the example sidecar JSON file above, the first operation specifies that the value of dataset "my_data" in
group "foo1", which has the specified object ID, should be replaced with the 1-D array [4, 5] (dtype: int8).

.. note::

Replacing the values of datasets or attributes with object references or a compound data type is not yet supported.

Deleting a dataset/attribute
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Specify ``"type": "delete"`` to delete (ignore) a dataset/attribute from the associated HDMF file
as specified by the ``object_id`` and ``relative_path``.

The operation specification does not use any additional keys.

In the example sidecar JSON file above, the second operation specifies that attribute "attr6"
at relative path "foo_holder/my_sub_data/attr6" from group "foo1", which has the specified object ID,
should be deleted.
If "attr6" is a required attribute, this is likely to result in an invalid file that cannot be read by HDMF.

Future changes
--------------

The HDMF team is considering supporting additional operations and expanding support for current operations
specified in the sidecar JSON file, such as:

- Add rows to a ``DynamicTable`` (column-based)
- Add rows to a ``Table`` (row-based)
- Add a new group
- Add a new dataset
- Add a new attribute
- Add a new link
- Replace a dataset or attribute with object references
- Replace a dataset or attribute with a compound data type
- Replace selected slices of a dataset or attribute
- Delete a group
- Delete a link

Please provide feedback on which operations are useful to you for HDMF to support in this
`issue ticket <https://github.com/hdmf-dev/hdmf/issues/676>`_.
194 changes: 194 additions & 0 deletions sidecar.schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "sidecar.schema.json",
"title": "Schema for the sidecar JSON file",
"description": "A schema for validating HDMF sidecar JSON files",
"version": "0.1.0",
"type": "object",
"additionalProperties": false,
"required": [
"description",
"author",
"contact",
"operations",
"schema_version"
],
"properties": {
"description": {
"description": "A free-form string describing the modifications specified in this file.",
"type": "string"
},
"author": {
"description": "A list of free-form strings containing the names of the people who created this file.",
"type": "array",
"items": {"type": "string"}
},
"contact": {
"description": "A list of email addresses for the people who created this file. Each author listed in the 'author' key *should* have a corresponding email address.",
"type": "array",
"items": {
"type": "string",
"pattern": "^.*@.*$"
}
},
"operations": {
"description": "A list of operations to perform on the data in the file.",
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"type",
"description",
"object_id",
"relative_path"
],
"properties": {
"type": {
"description": "The type of modification to perform.",
"member_region": {
"type": ["replace", "delete"]
}
},
"description": {
"description": "A description of the specified modification.",
"type": "string"
},
"object_id": {
"description": "The object ID (UUID) of the data type that is closest in the file hierarchy to the field being modified. Must be in the UUID-4 format with hyphens.",
"type": "string",
"pattern": "^[0-9a-f]{8}\\-[0-9a-f]{4}\\-4[0-9a-f]{3}\\-[89ab][0-9a-f]{3}\\-[0-9a-f]{12}$"
},
"relative_path": {
"description": " The relative path from the data type with the given object ID to the field being modified.",
"type": "string"
},
"element_type": {
"anyOf": [
{
"type": "string",
"enum": [
"group",
"dataset",
"attribute"
]
}
]
},
"value": {
"description": "The new value for the dataset/attribute.",
"member_region": {
"type": ["array", "string", "number", "boolean", "null"]
}
},
"dtype": {"$ref": "#/definitions/dtype"}
},
"allOf": [
{
"description": "if type==replace, then value is required.",
"if": {
"properties": { "type": { "const": "replace" } }
},
"then": {
"required": [ "value" ]
}
},
{
"description": "if type==delete, then value and dtype are not allowed.",
"if": {
"properties": { "type": { "const": "delete" } }
},
"then": {
"properties": {
"value": false,
"dtype": false
}
}
},
{
"description": "if type==create, then element_type is required.",
"if": {
"properties": { "type": { "const": "create" } }
},
"then": {
"required": [ "element_type" ]
}
}
]
}
},
"schema_version": {
"description": "The version of the sidecar JSON schema that the file conforms to. Must confirm to Semantic Versioning v2.0.",
"type": "string",
"pattern": "^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$"
}
},
"definitions": {
"dtype": {
"anyOf": [
{"$ref": "#/definitions/flat_dtype"},
{"$ref": "#/definitions/compound_dtype"}
]
},
"flat_dtype": {
"description": "String describing the data type of the dataset or attribute.",
"anyOf": [
{
"type": "string",
"enum": [
"float",
"float32",
"double",
"float64",
"long",
"int64",
"int",
"int32",
"int16",
"int8",
"uint",
"uint32",
"uint16",
"uint8",
"uint64",
"text",
"utf",
"utf8",
"utf-8",
"ascii",
"bool",
"isodatetime"
]
},
{"$ref": "#/definitions/ref_dtype"}
]
},
"ref_dtype": {
"type": "object",
"required": ["target_type", "reftype"],
"properties": {
"target_type": {
"description": "Describes the data_type of the target that the reference points to",
"type": "string"
},
"reftype": {
"description": "Describes the kind of reference",
"type": "string",
"enum": ["ref", "reference", "object", "region"]
}
}
},
"compound_dtype": {
"type": "array",
"items": {
"type": "object",
"required": ["name", "doc", "dtype"],
"properties": {
"name": {"$ref": "#/definitions/protectedString"},
"doc": {"type": "string"},
"dtype": {"$ref": "#/definitions/flat_dtype"}
}
}
}
}
}
1 change: 1 addition & 0 deletions src/hdmf/backends/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
from . import hdf5
from .builderupdater import SidecarValidationError
Loading