You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Discussions with @maxrjones made me realise that we're thinking about testing wrong. In particular, with #124 in place we will have well-defined expected properties that we can and should test against. Let me explain.
@maxrjones asked whether implementing loading via a ManifestStore/FunctionalStore as described by @ayushnag in #124 would be a problematic mixing of concerns between creating virtual references and also loading data. But actually it's the opposite: this is exactly what we should be doing.
The reason is that if we cannot correctly load data via a ManifestStore, we cannot correctly load it from an IcechunkStore either. Loading from an in-memory ManifestArray via zarr-python reading a ManifestStore and loading from an in-memory/on-disk/S3 IcechunkStore via zarr-python performs exactly the same set of decoding steps. Contrast that to how we currently load variables via a separate xarray backend (e.g. h5netcdf) - that does it's own decoding logic that isn't encoded in the zarr model.
To check that we are able to correctly load variables, the property to test is
i.e., we check our creation of manifests by loading from it and comparing against loading using a pre-existing xarray backend for that filetype. @dcherian told me once it's a good thing to have multiple different ways to load data from the same file format, as it allows cross-checking. This property just encodes that. We could write a similar property for any other virtualizable file format that has an xarray backend, and each test like this should live in virtualizarr/tests/test_readers/test_hdf.py etc.
Similarly we could also load from an in-memory icechunk store instead, which would make the property:
This now looks a lot like our existing round-trip tests, inclusive of #376 (FYI @jsignell).
But there is no fundamental difference between how the data is loaded in these two cases: it's still zarr-python getting and decoding the data according to the ManifestArrays our readers created. So I now think what we should do is use the ManifestStore idea (the first property) as much as possible internally, as it's the most lightweight and doesn't depend on the optional icechunk dependency. Then we also additionally test that the latter property holds for round-tripping via icechunk.
The text was updated successfully, but these errors were encountered:
The other reason to do this is that then it's obvious where to parameterize these tests with hypothesis strategies - i.e. for all netCDF files created by a strategy, the above property should hold. This is #347.
Discussions with @maxrjones made me realise that we're thinking about testing wrong. In particular, with #124 in place we will have well-defined expected properties that we can and should test against. Let me explain.
@maxrjones asked whether implementing loading via a
ManifestStore
/FunctionalStore
as described by @ayushnag in #124 would be a problematic mixing of concerns between creating virtual references and also loading data. But actually it's the opposite: this is exactly what we should be doing.The reason is that if we cannot correctly load data via a
ManifestStore
, we cannot correctly load it from anIcechunkStore
either. Loading from an in-memoryManifestArray
via zarr-python reading aManifestStore
and loading from an in-memory/on-disk/S3IcechunkStore
via zarr-python performs exactly the same set of decoding steps. Contrast that to how we currently load variables via a separate xarray backend (e.g. h5netcdf) - that does it's own decoding logic that isn't encoded in the zarr model.To check that we are able to correctly load variables, the property to test is
i.e., we check our creation of manifests by loading from it and comparing against loading using a pre-existing xarray backend for that filetype. @dcherian told me once it's a good thing to have multiple different ways to load data from the same file format, as it allows cross-checking. This property just encodes that. We could write a similar property for any other virtualizable file format that has an xarray backend, and each test like this should live in
virtualizarr/tests/test_readers/test_hdf.py
etc.Similarly we could also load from an in-memory icechunk store instead, which would make the property:
This now looks a lot like our existing round-trip tests, inclusive of #376 (FYI @jsignell).
But there is no fundamental difference between how the data is loaded in these two cases: it's still zarr-python getting and decoding the data according to the
ManifestArrays
our readers created. So I now think what we should do is use theManifestStore
idea (the first property) as much as possible internally, as it's the most lightweight and doesn't depend on the optional icechunk dependency. Then we also additionally test that the latter property holds for round-tripping via icechunk.The text was updated successfully, but these errors were encountered: