Add input spatial coarsening #92

Sukh-P · 2024-12-17T17:46:13Z

In ocf_datapipes we used a coarsening function as part of the sites datapipes (pv & wind) it was useful when we wanted to reduce the amount of data from certain inputs e.g a high spatial resolution NWP, would be good to add to move that function across here and add it to the Site Dataset

The text was updated successfully, but these errors were encountered:

alirashidAR · 2024-12-19T23:24:19Z

@Sukh-P, I would like to work on this. It will help me gain a better understanding of how Xarray works.

dfulu · 2025-01-02T13:13:05Z

@Sukh-P I think we should use this as an opportunity to do some due diligence as part of we porting this function across. Although I could be convinced that perhaps this should be put in a separate issue so it doesn't block us.

Some of the NWP data we get in production is at a different spatial resolution to what we have for training on. The function we aim to port across in this issue assumes that the difference between the training and production data we get from the NWP providers can be mapped by a spatial mean. As far as I know, we have never checked this.

I think it is quite possible that values we get are point samples rather than spatial means - especially for NWPs which don't use finite elements (i.e. pixels) for the computation but which solve a spectral form the equations (i.e. decompose into Legendre polynomials first). ECMWF use the spectral form.

I'm not sure how much of a difference this could make to the NWP values but I think it is worth investigating. Taking a spatial mean reduces the extreme values in NWP outputs, especially when the extreme values occur in single pixels. For example this could be precipitation which tends to be log-distributed or wind speeds over a complex topography.

Like I said, I'm not sure exactly what we are being given by the NWP providers, but I think we should do the work to check the compatibility which has not been done before. I suggest that we should check the distribution of values between the training and production data, particularly for variables that aren't normally distributed or highly spatially correlated - like precipitation and wind speed - and particularly focusing on the tails. We might find that this coarsening is fine, but I think it would be best practice to actually look into it rather than to assume it works.

I realise this might block us, so I'd be happy if this checking were moved to a separate issue, but I think it should not be forgotten

Sukh-P · 2025-01-06T18:15:37Z

That's a really interesting & great point @dfulu. I definitely agree it is worth doing this work and analysis to see if this assumption of coarsening by taking the spatial mean holds for those variables.

I think I am leaning towards a separate issue since this would block us on using datasampler for sites for now. It begins to get even more complicated for example in the india-forecast-app now there is some regridding we do from the spatial resolution of the live data to the training resolution. So a proper look at how the NWP data is impacted by this coarsening definitely should be done soon. Let me know if you are happy with this being a separate issue.

@alirashidAR sorry for the delay, depending on what we decide I can assign this to you if we plan to do this soon! Thanks

dfulu · 2025-01-06T18:21:55Z

Yeh I agree that it should be a separate issue. I'm happy to port this

Sukh-P · 2025-01-06T18:29:48Z

Also is this relevant to figuring out what ECMWF is giving us in terms of NWP data https://confluence.ecmwf.int/display/FUG/Section+3.2+Grid+Point+Values+and+Observations? It seems to suggest the values are spatial means but I am not sure if this incudes all variables and the forecasts we consume specifically

alirashidAR · 2025-01-09T11:57:38Z

Hi @Sukh-P ,
Just to cross check the function should be implemented here https://github.com/openclimatefix/ocf-data-sampler/blob/ad6d73826365807f29f72185b894f44132333246/ocf_data_sampler/load/site.py right?

Sukh-P · 2025-01-09T12:35:59Z

Hey @alirashidAR thanks for picking this up, since this is a utility/common function maybe best to stick it here https://github.com/openclimatefix/ocf-data-sampler/blob/main/ocf_data_sampler/load/utils.py

Sukh-P added enhancement New feature or request and removed enhancement New feature or request labels Dec 17, 2024

Sukh-P assigned alirashidAR Jan 6, 2025

This was referenced Jan 10, 2025

feat:added corasen function #113

Closed

feat:added corasen function #114

Closed

feat: added spatial coarsening for input #115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add input spatial coarsening #92

Add input spatial coarsening #92

Sukh-P commented Dec 17, 2024

alirashidAR commented Dec 19, 2024

dfulu commented Jan 2, 2025

Sukh-P commented Jan 6, 2025

dfulu commented Jan 6, 2025

Sukh-P commented Jan 6, 2025

alirashidAR commented Jan 9, 2025 •

edited

Loading

Sukh-P commented Jan 9, 2025

Add input spatial coarsening #92

Add input spatial coarsening #92

Comments

Sukh-P commented Dec 17, 2024

alirashidAR commented Dec 19, 2024

dfulu commented Jan 2, 2025

Sukh-P commented Jan 6, 2025

dfulu commented Jan 6, 2025

Sukh-P commented Jan 6, 2025

alirashidAR commented Jan 9, 2025 • edited Loading

Sukh-P commented Jan 9, 2025

alirashidAR commented Jan 9, 2025 •

edited

Loading