Skip to content

Commit

Permalink
"source" encoding for datasets opened from fsspec objects (#8923)
Browse files Browse the repository at this point in the history
* draft for setting `source` from pre-opened `fsspec` file objects

* refactor to only import `fsspec` if we're actually going to check

Could use `getattr(filename_or_obj, "path", filename_or_obj)` to avoid
`isinstance` checks.

* replace with a simple `getattr` on `"path"`

* add a test

* whats-new entry

* open the file as a context manager
  • Loading branch information
keewis authored Jun 30, 2024
1 parent 42ed6d3 commit caed274
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 2 deletions.
2 changes: 2 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ New Features
~~~~~~~~~~~~
- Allow chunking for arrays with duplicated dimension names (:issue:`8759`, :pull:`9099`).
By `Martin Raspaud <https://github.com/mraspaud>`_.
- Extract the source url from fsspec objects (:issue:`9142`, :pull:`8923`).
By `Justus Magin <https://github.com/keewis>`_.

Breaking changes
~~~~~~~~~~~~~~~~
Expand Down
7 changes: 5 additions & 2 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -382,8 +382,11 @@ def _dataset_from_backend_dataset(
ds.set_close(backend_ds._close)

# Ensure source filename always stored in dataset object
if "source" not in ds.encoding and isinstance(filename_or_obj, (str, os.PathLike)):
ds.encoding["source"] = _normalize_path(filename_or_obj)
if "source" not in ds.encoding:
path = getattr(filename_or_obj, "path", filename_or_obj)

if isinstance(path, (str, os.PathLike)):
ds.encoding["source"] = _normalize_path(path)

return ds

Expand Down
15 changes: 15 additions & 0 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -5151,6 +5151,21 @@ def test_source_encoding_always_present_with_pathlib() -> None:
assert ds.encoding["source"] == tmp


@requires_h5netcdf
@requires_fsspec
def test_source_encoding_always_present_with_fsspec() -> None:
import fsspec

rnddata = np.random.randn(10)
original = Dataset({"foo": ("x", rnddata)})
with create_tmp_file() as tmp:
original.to_netcdf(tmp)

fs = fsspec.filesystem("file")
with fs.open(tmp) as f, open_dataset(f) as ds:
assert ds.encoding["source"] == tmp


def _assert_no_dates_out_of_range_warning(record):
undesired_message = "dates out of range"
for warning in record:
Expand Down

0 comments on commit caed274

Please sign in to comment.