Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input spatial coarsening #92

Open
Sukh-P opened this issue Dec 17, 2024 · 7 comments
Open

Add input spatial coarsening #92

Sukh-P opened this issue Dec 17, 2024 · 7 comments
Assignees

Comments

@Sukh-P
Copy link
Member

Sukh-P commented Dec 17, 2024

In ocf_datapipes we used a coarsening function as part of the sites datapipes (pv & wind) it was useful when we wanted to reduce the amount of data from certain inputs e.g a high spatial resolution NWP, would be good to add to move that function across here and add it to the Site Dataset

@Sukh-P Sukh-P added enhancement New feature or request and removed enhancement New feature or request labels Dec 17, 2024
@alirashidAR
Copy link
Contributor

@Sukh-P, I would like to work on this. It will help me gain a better understanding of how Xarray works.

@dfulu
Copy link
Member

dfulu commented Jan 2, 2025

@Sukh-P I think we should use this as an opportunity to do some due diligence as part of we porting this function across. Although I could be convinced that perhaps this should be put in a separate issue so it doesn't block us.

Some of the NWP data we get in production is at a different spatial resolution to what we have for training on. The function we aim to port across in this issue assumes that the difference between the training and production data we get from the NWP providers can be mapped by a spatial mean. As far as I know, we have never checked this.

I think it is quite possible that values we get are point samples rather than spatial means - especially for NWPs which don't use finite elements (i.e. pixels) for the computation but which solve a spectral form the equations (i.e. decompose into Legendre polynomials first). ECMWF use the spectral form.

I'm not sure how much of a difference this could make to the NWP values but I think it is worth investigating. Taking a spatial mean reduces the extreme values in NWP outputs, especially when the extreme values occur in single pixels. For example this could be precipitation which tends to be log-distributed or wind speeds over a complex topography.

Like I said, I'm not sure exactly what we are being given by the NWP providers, but I think we should do the work to check the compatibility which has not been done before. I suggest that we should check the distribution of values between the training and production data, particularly for variables that aren't normally distributed or highly spatially correlated - like precipitation and wind speed - and particularly focusing on the tails. We might find that this coarsening is fine, but I think it would be best practice to actually look into it rather than to assume it works.

I realise this might block us, so I'd be happy if this checking were moved to a separate issue, but I think it should not be forgotten

@Sukh-P
Copy link
Member Author

Sukh-P commented Jan 6, 2025

That's a really interesting & great point @dfulu. I definitely agree it is worth doing this work and analysis to see if this assumption of coarsening by taking the spatial mean holds for those variables.

I think I am leaning towards a separate issue since this would block us on using datasampler for sites for now. It begins to get even more complicated for example in the india-forecast-app now there is some regridding we do from the spatial resolution of the live data to the training resolution. So a proper look at how the NWP data is impacted by this coarsening definitely should be done soon. Let me know if you are happy with this being a separate issue.

@alirashidAR sorry for the delay, depending on what we decide I can assign this to you if we plan to do this soon! Thanks

@dfulu
Copy link
Member

dfulu commented Jan 6, 2025

Yeh I agree that it should be a separate issue. I'm happy to port this

@Sukh-P
Copy link
Member Author

Sukh-P commented Jan 6, 2025

Also is this relevant to figuring out what ECMWF is giving us in terms of NWP data https://confluence.ecmwf.int/display/FUG/Section+3.2+Grid+Point+Values+and+Observations? It seems to suggest the values are spatial means but I am not sure if this incudes all variables and the forecasts we consume specifically

@alirashidAR
Copy link
Contributor

alirashidAR commented Jan 9, 2025

Hi @Sukh-P ,
Just to cross check the function should be implemented here https://github.com/openclimatefix/ocf-data-sampler/blob/ad6d73826365807f29f72185b894f44132333246/ocf_data_sampler/load/site.py right?

@Sukh-P
Copy link
Member Author

Sukh-P commented Jan 9, 2025

Hey @alirashidAR thanks for picking this up, since this is a utility/common function maybe best to stick it here https://github.com/openclimatefix/ocf-data-sampler/blob/main/ocf_data_sampler/load/utils.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants