You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If NWP provider returns xarray.Dataset the following error occurs:
AttributeError: 'str' object has no attribute 'assign_attrs'
This exception is thrown by __iter__ of MapperIterDataPipe(datapipe=DictDatasetIterDataPipe, fn=combine_to_single_dataset, input_col=None, output_col=None)
This is caused by this bit in ocf_datapipes/utils/utils.py
To Reproduce
Steps to reproduce the behavior:
Generally will be thrown by any xr.Dataset NWP, so either loading ICON or changing another provider to return xr.Dataset should reproduce. Steps I used to get this error from ECMWF:
Go to ocf_datapipes/load/nwp/providers/ecmwf.py
Comment out lines 25-26
Substitute with ifs = nwp
run PVNet/save_batches.py
Expected behaviour
Should be returning xr.DataArray when iterating over it, but returns str because iteration over xr.Dataset iterates over variable names instead of their contents.
There might be a different way around this, especially if some data sources cannot be manipulated to be output as a xr.DataArray, but there may be merit to committing to a more standardised loader output.
The text was updated successfully, but these errors were encountered:
On reflection, the thing that's crashing generally seems a bit odd; my best guess it was supposed to support multiple pv sources? (In this case, should we hide it under an if statement by key or something? Because it's doing a lot of unnecessary stuff when it is used for anything else as far as I can tell)
Overall by the time you get to that, iteration for key, datasets in dataset_dict.items(): will iterate through sources as in key="wind", datasets=wind_data and for nwps it looks like key="nwp-{source}", datasets={source_data}. At this point source_data is normally a xr.DataArray with variables stored as a channel dimension, but technically there is nothing preventing it from being a xr.Dataset with variables still being weather variables. However, the next loop for dataset in datasets: would either loop over variable names in a xr.Dataset, which is the source of the issue above, or just return the dataArray itself for xr.DataArray -- anyway, in short, unless I'm reading this very wrong, this entire bit does nothing but assign attributes to the data in most cases (all nwps, wind, pvsite data), and looks kind of convoluted for what it is.
The next for loop loops over all the same things, and also does nothing to the actual structure other than prepend source to all names (useful, but again, very convoluted for what it is)
I guess my question is, what is the use case for this if it exists anymore at all? Because someone clearly went to a lot of trouble to do it this way and make absolutely sure that it combines xr.Datasets, but it just seems to do a lot of unnecessary operations most of the time now.
Basically I think it operates under the assumption that every source supplies a dictionary of datasets/dataArrays, which is currently false most of the time (probably only true for gsp data? I might be wrong here as have experience only with pv_site and windnet)
Describe the bug
If NWP provider returns
xarray.Dataset
the following error occurs:This is caused by this bit in
ocf_datapipes/utils/utils.py
To Reproduce
Steps to reproduce the behavior:
Generally will be thrown by any
xr.Dataset
NWP, so either loading ICON or changing another provider to returnxr.Dataset
should reproduce. Steps I used to get this error from ECMWF:ifs = nwp
Expected behaviour
Should be returning
xr.DataArray
when iterating over it, but returnsstr
because iteration overxr.Dataset
iterates over variable names instead of their contents.There might be a different way around this, especially if some data sources cannot be manipulated to be output as a
xr.DataArray
, but there may be merit to committing to a more standardised loader output.The text was updated successfully, but these errors were encountered: