Refactor tests around expected properties #394

TomNicholas · 2025-01-28T20:47:48Z

Discussions with @maxrjones made me realise that we're thinking about testing wrong. In particular, with #124 in place we will have well-defined expected properties that we can and should test against. Let me explain.

@maxrjones asked whether implementing loading via a ManifestStore/FunctionalStore as described by @ayushnag in #124 would be a problematic mixing of concerns between creating virtual references and also loading data. But actually it's the opposite: this is exactly what we should be doing.

The reason is that if we cannot correctly load data via a ManifestStore, we cannot correctly load it from an IcechunkStore either. Loading from an in-memory ManifestArray via zarr-python reading a ManifestStore and loading from an in-memory/on-disk/S3 IcechunkStore via zarr-python performs exactly the same set of decoding steps. Contrast that to how we currently load variables via a separate xarray backend (e.g. h5netcdf) - that does it's own decoding logic that isn't encoded in the zarr model.

To check that we are able to correctly load variables, the property to test is

loaded_ds_from_manifests = open_virtual_dataset('file.nc', backend='netcdf', loadable_variables='all')

loaded_ds_from_backend = xr.open_dataset('file.nc', engine='h5netcdf').load()
xrt.assert_identical(loaded_ds_from_manifests, loaded_ds_from_backend)

i.e., we check our creation of manifests by loading from it and comparing against loading using a pre-existing xarray backend for that filetype. @dcherian told me once it's a good thing to have multiple different ways to load data from the same file format, as it allows cross-checking. This property just encodes that. We could write a similar property for any other virtualizable file format that has an xarray backend, and each test like this should live in virtualizarr/tests/test_readers/test_hdf.py etc.

Similarly we could also load from an in-memory icechunk store instead, which would make the property:

vds = open_virtual_dataset('file.nc', backend='netcdf')
vds.virtualize.to_icechunk(icechunkstore)
loaded_ds_from_icechunk = xr.open_zarr(icechunkstore).load()

loaded_ds_from_backend = xr.open_dataset('file.nc', engine='h5netcdf').load()
xrt.assert_identical(loaded_ds_from_manifests, loaded_ds_from_backend)

This now looks a lot like our existing round-trip tests, inclusive of #376 (FYI @jsignell).

But there is no fundamental difference between how the data is loaded in these two cases: it's still zarr-python getting and decoding the data according to the ManifestArrays our readers created. So I now think what we should do is use the ManifestStore idea (the first property) as much as possible internally, as it's the most lightweight and doesn't depend on the optional icechunk dependency. Then we also additionally test that the latter property holds for round-tripping via icechunk.

The text was updated successfully, but these errors were encountered:

TomNicholas · 2025-01-29T20:24:53Z

The other reason to do this is that then it's obvious where to parameterize these tests with hypothesis strategies - i.e. for all netCDF files created by a strategy, the above property should hold. This is #347.

TomNicholas added internals testing labels Jan 28, 2025

TomNicholas mentioned this issue Jan 28, 2025

Planning explicit dependency on Zarr v3 #392

Open

4 tasks

maxrjones added this to VirtualiZarr Zarr-Python 3.0 migration Jan 29, 2025

maxrjones moved this to Todo in VirtualiZarr Zarr-Python 3.0 migration Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor tests around expected properties #394

Refactor tests around expected properties #394

TomNicholas commented Jan 28, 2025 •

edited

Loading

TomNicholas commented Jan 29, 2025

Refactor tests around expected properties #394

Refactor tests around expected properties #394

Comments

TomNicholas commented Jan 28, 2025 • edited Loading

TomNicholas commented Jan 29, 2025

TomNicholas commented Jan 28, 2025 •

edited

Loading