-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add input spatial coarsening #92
Comments
@Sukh-P, I would like to work on this. It will help me gain a better understanding of how Xarray works. |
@Sukh-P I think we should use this as an opportunity to do some due diligence as part of we porting this function across. Although I could be convinced that perhaps this should be put in a separate issue so it doesn't block us. Some of the NWP data we get in production is at a different spatial resolution to what we have for training on. The function we aim to port across in this issue assumes that the difference between the training and production data we get from the NWP providers can be mapped by a spatial mean. As far as I know, we have never checked this. I think it is quite possible that values we get are point samples rather than spatial means - especially for NWPs which don't use finite elements (i.e. pixels) for the computation but which solve a spectral form the equations (i.e. decompose into Legendre polynomials first). ECMWF use the spectral form. I'm not sure how much of a difference this could make to the NWP values but I think it is worth investigating. Taking a spatial mean reduces the extreme values in NWP outputs, especially when the extreme values occur in single pixels. For example this could be precipitation which tends to be log-distributed or wind speeds over a complex topography. Like I said, I'm not sure exactly what we are being given by the NWP providers, but I think we should do the work to check the compatibility which has not been done before. I suggest that we should check the distribution of values between the training and production data, particularly for variables that aren't normally distributed or highly spatially correlated - like precipitation and wind speed - and particularly focusing on the tails. We might find that this coarsening is fine, but I think it would be best practice to actually look into it rather than to assume it works. I realise this might block us, so I'd be happy if this checking were moved to a separate issue, but I think it should not be forgotten |
That's a really interesting & great point @dfulu. I definitely agree it is worth doing this work and analysis to see if this assumption of coarsening by taking the spatial mean holds for those variables. I think I am leaning towards a separate issue since this would block us on using datasampler for sites for now. It begins to get even more complicated for example in the @alirashidAR sorry for the delay, depending on what we decide I can assign this to you if we plan to do this soon! Thanks |
Yeh I agree that it should be a separate issue. I'm happy to port this |
Also is this relevant to figuring out what ECMWF is giving us in terms of NWP data https://confluence.ecmwf.int/display/FUG/Section+3.2+Grid+Point+Values+and+Observations? It seems to suggest the values are spatial means but I am not sure if this incudes all variables and the forecasts we consume specifically |
Hi @Sukh-P , |
Hey @alirashidAR thanks for picking this up, since this is a utility/common function maybe best to stick it here https://github.com/openclimatefix/ocf-data-sampler/blob/main/ocf_data_sampler/load/utils.py |
In
ocf_datapipes
we used a coarsening function as part of the sites datapipes (pv & wind) it was useful when we wanted to reduce the amount of data from certain inputs e.g a high spatial resolution NWP, would be good to add to move that function across here and add it to the Site DatasetThe text was updated successfully, but these errors were encountered: