Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessarily Large Data Request #74

Open
sgdecker opened this issue Mar 11, 2022 · 3 comments
Open

Unnecessarily Large Data Request #74

sgdecker opened this issue Mar 11, 2022 · 3 comments

Comments

@sgdecker
Copy link

I'm not sure if this is a bug report, feature request, or user error. I'm trying to access a giant dataset from the NCAR RDA in a smart way (only downloading what's necessary for the calculation), but a large data request is made anyway that exceeds the server's 500 MB limit.

Here's my code:

import numpy as np
import xarray as xr
from dask.diagnostics import ProgressBar
import intake


wrf_url = ('https://rda.ucar.edu/thredds/catalog/files/g/ds612.0/'
           'PGW3D/2006/catalog.xml')
catalog_u = intake.open_thredds_merged(wrf_url, path=['*_U_2006060*'])
catalog_v = intake.open_thredds_merged(wrf_url, path=['*_V_2006060*'])

ds_u = catalog_u.to_dask()
ds_u['U'] = ds_u.U.chunk("auto")
ds_v = catalog_v.to_dask()
ds_v['V'] = ds_v.V.chunk("auto")
ds = xr.merge((ds_u, ds_v))


def unstagger(ds, var, coord, new_coord):
    var1 = ds[var].isel({coord: slice(None, -1)})
    var2 = ds[var].isel({coord: slice(1, None)})
    return ((var1 + var2) / 2).rename({coord: new_coord})


with ProgressBar():
    ds['U_unstaggered'] = unstagger(ds, 'U', 'west_east_stag', 'west_east')
    ds['V_unstaggered'] = unstagger(ds, 'V', 'south_north_stag', 'south_north')
    ds['speed'] = np.hypot(ds.U_unstaggered, ds.V_unstaggered)
    ds.speed.isel(bottom_top=10).sel(Time='2006-06-07T18:00').plot()

This fails with

Traceback (most recent call last):
  File "/home/decker/classes/met325/rda_plot.py", line 29, in <module>
    ds.speed.isel(bottom_top=10).sel(Time='2006-06-07T18:00').plot()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/plot/plot.py", line 862, in __call__
    return plot(self._da, **kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/plot/plot.py", line 293, in plot
    darray = darray.squeeze().compute()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/dataarray.py", line 951, in compute
    return new.load(**kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/dataarray.py", line 925, in load
    ds = self._to_temp_dataset().load(**kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/dataset.py", line 862, in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/base.py", line 571, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/threaded.py", line 79, in get
    results = get_async(
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/local.py", line 507, in get_async
    raise_exception(exc, tb)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/local.py", line 315, in reraise
    raise exc
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/local.py", line 220, in execute_task
    result = _execute_task(task, data)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/array/core.py", line 116, in getter
    c = np.asarray(c)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 357, in __array__
    return np.asarray(self.array, dtype=dtype)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 521, in __array__
    return np.asarray(self.array, dtype=dtype)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 422, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/conventions.py", line 62, in __getitem__
    return np.asarray(self.array[key], dtype=self.dtype)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 422, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/pydap_.py", line 39, in __getitem__
    return indexing.explicit_indexing_adapter(
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 711, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/pydap_.py", line 47, in _getitem
    result = robust_getitem(array, key, catch=ValueError)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/common.py", line 64, in robust_getitem
    return array[key]
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/model.py", line 323, in __getitem__
    out.data = self._get_data_index(index)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/model.py", line 353, in _get_data_index
    return self._data[index]
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/handlers/dap.py", line 170, in __getitem__
    raise_for_status(r)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/net.py", line 38, in raise_for_status
    raise HTTPError(
webob.exc.HTTPError: 403 403

because the data request is too large.

Folks at NCAR tell me the request comes across as

rda.ucar.edu/thredds/dodsC/files/g/ds612.0/PGW3D/2006/wrf3d_d01_PGW_U_20060607.nc.dods?U%5B0:1: 7%5D%5B0:1:49%5D%5B0:1:1014%5D%5B0:1:1359%5D

essentially pulling an entire variable.

Is what I'm trying to do supposed to work?

I can use siphon directly w/o issue:

import numpy as np
import matplotlib.pyplot as plt
from siphon.catalog import TDSCatalog

catUrl = ('https://rda.ucar.edu/thredds/catalog/files/g/ds612.0/'
          'PGW3D/2006/catalog.xml')
catalog = TDSCatalog(catUrl)
U_file = 'wrf3d_d01_PGW_U_20060718.nc'
V_file = 'wrf3d_d01_PGW_V_20060718.nc'
ds = catalog.datasets[U_file]
dataset = ds.remote_access()
u = dataset.variables['U']
ds = catalog.datasets[V_file]
dataset = ds.remote_access()
v = dataset.variables['V']
speed = np.hypot(u[1, 10, 0:1014, 0:1359], v[1, 10, 0:1014, 0:1359])
plt.imshow(speed)
plt.show()

but in that case I don't have all the xarray niceties w/o extra work.

@martindurant
Copy link
Member

As you can see, xarray is calling pydap, which then does the fetch. I don't suppose there's anything that we can do about how that happens at the intake level. You might want to cross-post on xarray or pydap, though. Perhaps someone might chime in here on how siphon does this differently.

Since the target file is, eventually, just a netCDF, you could pass its URL or an open fsspec file directly to xarray (presumably with engine "h5netcdf"), or kerchunk it.

@dopplershift
Copy link
Collaborator

Since it's DAP, you can also try to make sure xarray is using the netcdf4 engine, since netCDF-c is usually compiled with DAP support.

@sgdecker
Copy link
Author

I tried the following variations:

Variation 1

catalog_u = intake.open_thredds_merged(wrf_url, driver='netcdf',
                                       path=['*_U_2006060*'])
catalog_v = intake.open_thredds_merged(wrf_url, driver='netcdf',
                                       path=['*_V_2006060*'])

Output

Traceback (most recent call last):
  File "/home/decker/classes/met325/rda_plot.py", line 14, in <module>
    ds_u = catalog_u.to_dask()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 69, in to_dask
    return self.read_chunked()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 44, in read_chunked
    self._load_metadata()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake/source/base.py", line 236, in _load_metadata
    self._schema = self._get_schema()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 18, in _get_schema
    self._open_dataset()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_thredds/source.py", line 82, in _open_dataset
    cat = ThreddsCatalog(self.urlpath, driver=self.driver)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_thredds/cat.py", line 30, in __init__
    super().__init__(**kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake/catalog/base.py", line 110, in __init__
    self.force_reload()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake/catalog/base.py", line 168, in force_reload
    self._load()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_thredds/cat.py", line 81, in _load
    {
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_thredds/cat.py", line 87, in <dictcomp>
    {'urlpath': access_urls(ds, self), 'chunks': {}},
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_thredds/cat.py", line 75, in access_urls
    url = ds.access_urls[driver_for_access_urls]
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/siphon/catalog.py", line 221, in __getitem__
    return super(CaseInsensitiveDict, self).__getitem__(CaseInsensitiveStr(key))
KeyError: 'HTTPServer'

Variation 2

catalog_u = intake.open_thredds_merged(wrf_url,
                                       xarray_kwargs={'engine': 'netcdf4'},
                                       path=['*_U_2006060*'])
catalog_v = intake.open_thredds_merged(wrf_url,
                                       xarray_kwargs={'engine': 'netcdf4'},
                                       path=['*_V_2006060*'])

Output

Dataset(s):   0%|                                        | 0/9 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/decker/classes/met325/rda_plot.py", line 16, in <module>
    ds_u = catalog_u.to_dask()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 69, in to_dask
    return self.read_chunked()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 44, in read_chunked
    self._load_metadata()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake/source/base.py", line 236, in _load_metadata
    self._schema = self._get_schema()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 18, in _get_schema
    self._open_dataset()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_thredds/source.py", line 90, in _open_dataset
    data = [
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_thredds/source.py", line 91, in <listcomp>
    ds(xarray_kwargs=self.xarray_kwargs).to_dask()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 69, in to_dask
    return self.read_chunked()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 44, in read_chunked
    self._load_metadata()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake/source/base.py", line 236, in _load_metadata
    self._schema = self._get_schema()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 18, in _get_schema
    self._open_dataset()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/opendap.py", line 98, in _open_dataset
    self._ds = xr.open_dataset(store, chunks=self.chunks, **self._kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/api.py", line 495, in open_dataset
    backend_ds = backend.open_dataset(
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 550, in open_dataset
    store = NetCDF4DataStore.open(
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 352, in open
    raise ValueError(
ValueError: can only read bytes or file-like objects with engine='scipy' or 'h5netcdf'
Dataset(s):   0%|                                        | 0/9 [00:01<?, ?it/s]

Variation 3

catalog_u = intake.open_thredds_merged(wrf_url,
                                       xarray_kwargs={'engine': 'h5netcdf'},
                                       path=['*_U_2006060*'])
catalog_v = intake.open_thredds_merged(wrf_url,
                                       xarray_kwargs={'engine': 'h5netcdf'},
                                       path=['*_V_2006060*'])

Output

Dataset(s):   0%|                                        | 0/9 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 199, in _acquire_with_cache_info
    file = self._cache[self._key]
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/lru_cache.py", line 53, in __getitem__
    value = self._cache[key]
KeyError: [<class 'h5netcdf.core.File'>, (<xarray.backends.pydap_.PydapDataStore object at 0x7f374deb7160>,), 'r', (('decode_vlen_strings', True), ('invalid_netcdf', None))]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/decker/classes/met325/rda_plot.py", line 16, in <module>
    ds_u = catalog_u.to_dask()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 69, in to_dask
    return self.read_chunked()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 44, in read_chunked
    self._load_metadata()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake/source/base.py", line 236, in _load_metadata
    self._schema = self._get_schema()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 18, in _get_schema
    self._open_dataset()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_thredds/source.py", line 90, in _open_dataset
    data = [
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_thredds/source.py", line 91, in <listcomp>
    ds(xarray_kwargs=self.xarray_kwargs).to_dask()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 69, in to_dask
    return self.read_chunked()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 44, in read_chunked
    self._load_metadata()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake/source/base.py", line 236, in _load_metadata
    self._schema = self._get_schema()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/base.py", line 18, in _get_schema
    self._open_dataset()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/intake_xarray/opendap.py", line 98, in _open_dataset
    self._ds = xr.open_dataset(store, chunks=self.chunks, **self._kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/api.py", line 495, in open_dataset
    backend_ds = backend.open_dataset(
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 374, in open_dataset
    store = H5NetCDFStore.open(
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 178, in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 123, in __init__
    self._filename = find_root_and_group(self.ds)[0].filename
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 189, in ds
    return self._acquire()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 181, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 187, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 205, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/h5netcdf/core.py", line 951, in __init__
    self._h5file = h5py.File(
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/h5py/_hl/files.py", line 486, in __init__
    name = filename_encode(name)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/h5py/_hl/compat.py", line 19, in filename_encode
    filename = fspath(filename)
TypeError: expected str, bytes or os.PathLike object, not PydapDataStore
Dataset(s):   0%|                                        | 0/9 [00:01<?, ?it/s]

I'll have to look into some of the other methods that were suggested, but this is not turning out to be as straightforward as I had hoped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants