chore: Add test for checking physical limits and zeroes in NWP data #… #340

glitch401 · 2024-07-03T20:28:29Z

Pull Request

Description

This pull request addresses issues identified in #335 and #337 by implementing checks for zeros and physical limits in NWP data processing. The changes ensure that the OpenNWP class correctly raises a ValueError when encountering NWP data arrays containing zeros (addressing #335) and when NWP data values are outside specified physical limits (addressing #337). These enhancements are crucial for maintaining data integrity and reliability in our processing pipeline.

Fixes #337 , #335

How Has This Been Tested?

The modifications have been validated through comprehensive unit tests. Specifically, tests were added to verify that a ValueError is raised both when zeros are present in the data array and when data values fall outside of physical limits. These tests were conducted using sample Zarr datasets designed to mimic real-world scenarios where such issues might arise.

Yes

A sanity check was performed by visually inspecting the processed data to ensure that the new checks effectively identify and handle data with zeros and data outside physical limits.

Yes

Checklist:

My code follows OCF's coding style guidelines
I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
I have checked my code and corrected any misspellings

…penclimatefix#335 and openclimatefix#337

for more information, see https://pre-commit.ci

…s and attributes

for more information, see https://pre-commit.ci

glitch401 · 2024-07-04T14:22:14Z

@peterdudfield are there any other suggestion for this PR?

ocf_datapipes/load/nwp/nwp.py

peterdudfield · 2024-07-04T15:49:55Z

@peterdudfield are there any other suggestion for this PR?

Thanks so much, ive put a few comments, but then i think it should be ready

…ical limits

for more information, see https://pre-commit.ci

ocf_datapipes/load/nwp/nwp.py

for more information, see https://pre-commit.ci

glitch401 · 2024-07-11T17:57:08Z

hey @peterdudfield , apologies for the delay in solving this
but I've made certain changes to solve #336

.gitignore

peterdudfield · 2024-07-12T11:31:02Z

@AUdaltsova (and maybe @Sukh-P) do you mind looking at this?
I think this code is ready to merge, but I know you 2 are using the code, so I dont want to break something as you running stuff

AUdaltsova · 2024-07-12T12:59:06Z

ocf_datapipes/load/nwp/nwp.py

+    def check_if_zeros(self, nwp: Union[xr.DataArray, xr.Dataset]):
+        """Checks if the NWP data contains zeros"""
+        if isinstance(nwp, xr.DataArray):
+            if (nwp.values == 0).any():


This looks like it will not be performed lazily (as in, this will load the whole dataArray into memory to check the values), which we really want to avoid in this place because at this point the arrays we operate on are often massive

I've made some changes to accommodate lazy loading. Have leveraged Dask arrays, as its often used with xarray for the same

That's a good way to go about it! There is a danger that some of our data might not fit anyway, but since it can be turned on and off that's fine.

I was wondering if it's worth exploring implementing this check downstream, somewhere after spacial and temporal crop and before normalisation, so that it operates on samples instead? And then maybe skip ones with too many zeroes/nans/out of physical bounds values and give a userWarning/log info of how many were skipped as a proxy for understanding how much of the data is corrupted. Thoughts @Sukh-P @peterdudfield?

I like that idea, less chance of a chance of running into memory issues by loading a chunk, I guess the only draw back would be doing processing on some data you are going to chuck anyway but in this case that processing gets it down to a more manageable size

@AUdaltsova I'm trying to understand if this is acceptable w.r.t the scope of this PR? 🤔

@glitch401 might be! @peterdudfield happy to merge this then? :)

AUdaltsova · 2024-07-12T13:16:09Z

ocf_datapipes/batch/merge_numpy_examples_to_batch.py

@@ -90,7 +90,7 @@ def stack_np_examples_into_batch(dict_list: Sequence[NumpyBatch]) -> NumpyBatch:

                nwp_batch[nwp_source] = nwp_source_batch

-            batch[BatchKey.nwp] = nwp_batch
+            batch[BatchKey.nwp] = check_for_nans(nwp_batch)



I might be wrong, but I was under the impression that we allow for nans currently to be present in batches, which then get filled with zeroes during training? @peterdudfield is this a gsp thing?

Could we move this to when the NWP gets opened? And have an option to check it or not?
I think that would make it safer and clearer whats going on.

We could have a different issue that checks for nans in the batches, but we need to think how we turn that on and off .e.tc

@glitch401 would you mind moving this to when the nwp is opened? with an option to do this or not.

do you mean, when data element NWP is opened?

Like below, in the load stage

gotcha, will append changes

i was hoping this would be removed, and it would be mvoed to below

This comment is still open

AUdaltsova · 2024-07-12T13:18:07Z

ocf_datapipes/load/nwp/nwp.py

+            "VIS008": (0, 1000),  # Visible channel
+            "WV_062": (0, 1000),  # Water vapor channel
+            "WV_073": (0, 1000),  # Water vapor channel
+        }
        logger.info(f"Using {provider.lower()}")


very much just a suggestion, but it would be nice to have some control over which variables receive the checks. Intuitively, that should probably be possible by just passing a list of keys to be checked instead of True to check_for_zeroes/check_physical_limits

ocf_datapipes/load/nwp/nwp.py

AUdaltsova · 2024-07-12T13:24:46Z

Thank you so much for doing this! Really really appreciate that. The only important note that I have is the laziness thing. I've left some nitpicky suggestions, but please do treat them more as thoughts than actual requests

@peterdudfield re: breaking something, I'll not be affected by any updates, I'm very much locked into my version :) Thanks for asking!

Sukh-P · 2024-07-12T14:24:35Z

ocf_datapipes/load/nwp/nwp.py

+            "lcc": (0, 100),  # Low cloud cover, %
+            "tcc": (0, 100),  # Total cloud cover, %
+            "sde": (0, 1000),  # Snowfall depth, meters
+            "sr": (0, 10),  # Surface roughness, meters


So this is getting right into OCFs data model here but @devsjc has helped me understand that we have an internal naming convention that deviates slightly from some NWP providers e.g. sr actually maps to dsrp for us, not surface roughness. Obviously there would not be a way to know that as a contributor, so apologies for that. @peterdudfield FYI

Good point! Can supply keys and then pull corresponding ranges out of consts maybe?

you are right @Sukh-P , @peterdudfield did help me understand the conventions

for more information, see https://pre-commit.ci

glitch401 · 2024-07-16T18:41:53Z

@peterdudfield @AUdaltsova how does it look for now?

ocf_datapipes/load/nwp/nwp.py

for more information, see https://pre-commit.ci

ocf_datapipes/load/nwp/nwp.py

glitch401 · 2024-08-08T13:57:53Z

any updates @peterdudfield

…o check for NaNs when NWP is loaded

for more information, see https://pre-commit.ci

glitch401

updated changes

glitch401 · 2024-08-15T03:02:15Z

tests/batch/test_merge_numpy_examples_to_batch.py

@@ -40,49 +40,13 @@ def _single_batch_sample(fill_value):
    return sample


-def _single_batch_sample_nan(fill_value):


removed all the functions related to checking nans

ocf_datapipes/load/nwp/nwp.py

glitch401 · 2024-08-30T15:21:55Z

@peterdudfield any updates?

glitch401 and others added 2 commits July 3, 2024 15:20

chore: Add test for checking physical limits and zeroes in NWP data o…

3ee287c

…penclimatefix#335 and openclimatefix#337

[pre-commit.ci] auto fixes from pre-commit.com hooks

1e2df80

for more information, see https://pre-commit.ci

glitch401 marked this pull request as ready for review July 4, 2024 13:59

glitch401 marked this pull request as draft July 4, 2024 13:59

glitch401 and others added 2 commits July 4, 2024 09:19

changes to generate test data on the go. remove unnecessary zarr file…

8105b91

…s and attributes

[pre-commit.ci] auto fixes from pre-commit.com hooks

1eafe49

for more information, see https://pre-commit.ci

peterdudfield reviewed Jul 4, 2024

View reviewed changes

ocf_datapipes/load/nwp/nwp.py Outdated Show resolved Hide resolved

peterdudfield reviewed Jul 4, 2024

View reviewed changes

ocf_datapipes/load/nwp/nwp.py Outdated Show resolved Hide resolved

peterdudfield reviewed Jul 4, 2024

View reviewed changes

ocf_datapipes/load/nwp/nwp.py Outdated Show resolved Hide resolved

glitch401 and others added 5 commits July 4, 2024 11:08

Fix ValueError message for NWP data containing zeros and outside phys…

d5bc6cf

…ical limits

[pre-commit.ci] auto fixes from pre-commit.com hooks

d8cfa9d

for more information, see https://pre-commit.ci

Fix ValueError message coding style

5e68173

update physical limits in according to pvnet_uk_region/data_config.yaml

466b710

[pre-commit.ci] auto fixes from pre-commit.com hooks

692500c

for more information, see https://pre-commit.ci

glitch401 commented Jul 5, 2024

View reviewed changes

ocf_datapipes/load/nwp/nwp.py Outdated Show resolved Hide resolved

ocf_datapipes/load/nwp/nwp.py Show resolved Hide resolved

ocf_datapipes/load/nwp/nwp.py Show resolved Hide resolved

peterdudfield reviewed Jul 5, 2024

View reviewed changes

ocf_datapipes/load/nwp/nwp.py Outdated Show resolved Hide resolved

Update temperature physical limits in OpenNWPIterDataPipe

0667bab

glitch401 marked this pull request as ready for review July 5, 2024 11:20

glitch401 and others added 2 commits July 11, 2024 12:55

Fix NaN check in stack_np_examples_into_batch function

246d898

[pre-commit.ci] auto fixes from pre-commit.com hooks

55627eb

for more information, see https://pre-commit.ci

peterdudfield closed this Jul 12, 2024

peterdudfield reopened this Jul 12, 2024

peterdudfield reviewed Jul 12, 2024

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

AUdaltsova reviewed Jul 12, 2024

View reviewed changes

ocf_datapipes/load/nwp/nwp.py Show resolved Hide resolved

Sukh-P reviewed Jul 12, 2024

View reviewed changes

glitch401 and others added 2 commits July 16, 2024 13:30

changes made to adapt for lazy loading

7ba254d

[pre-commit.ci] auto fixes from pre-commit.com hooks

c6ee33d

for more information, see https://pre-commit.ci

peterdudfield reviewed Jul 24, 2024

View reviewed changes

ocf_datapipes/load/nwp/nwp.py Outdated Show resolved Hide resolved

glitch401 and others added 2 commits July 24, 2024 10:13

moved limits to a constant file

d0c4f6f

[pre-commit.ci] auto fixes from pre-commit.com hooks

19050c7

for more information, see https://pre-commit.ci

glitch401 commented Jul 24, 2024

View reviewed changes

ocf_datapipes/load/nwp/nwp.py Show resolved Hide resolved

glitch401 and others added 2 commits August 14, 2024 21:56

Refactor test_merge_numpy_examples_to_batch.py and test_load_nwp.py t…

3fe89fc

…o check for NaNs when NWP is loaded

[pre-commit.ci] auto fixes from pre-commit.com hooks

ace0259

for more information, see https://pre-commit.ci

glitch401 commented Aug 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Add test for checking physical limits and zeroes in NWP data #… #340

chore: Add test for checking physical limits and zeroes in NWP data #… #340

glitch401 commented Jul 3, 2024 •

edited

Loading

glitch401 commented Jul 4, 2024

peterdudfield commented Jul 4, 2024

glitch401 commented Jul 11, 2024

peterdudfield commented Jul 12, 2024

AUdaltsova Jul 12, 2024

glitch401 Jul 16, 2024

AUdaltsova Jul 18, 2024

Sukh-P Jul 18, 2024

glitch401 Jul 20, 2024

AUdaltsova Jul 24, 2024

AUdaltsova Jul 12, 2024

peterdudfield Jul 24, 2024

peterdudfield Aug 8, 2024

glitch401 Aug 8, 2024

peterdudfield Aug 8, 2024

glitch401 Aug 8, 2024

peterdudfield Aug 30, 2024

peterdudfield Aug 30, 2024

AUdaltsova Jul 12, 2024

AUdaltsova commented Jul 12, 2024 •

edited

Loading

Sukh-P Jul 12, 2024

AUdaltsova Jul 12, 2024

glitch401 Jul 16, 2024

glitch401 commented Jul 16, 2024

glitch401 commented Aug 8, 2024

glitch401 left a comment

glitch401 Aug 15, 2024

glitch401 commented Aug 30, 2024

		@@ -40,49 +40,13 @@ def _single_batch_sample(fill_value):
		return sample


		def _single_batch_sample_nan(fill_value):

chore: Add test for checking physical limits and zeroes in NWP data #… #340

Are you sure you want to change the base?

chore: Add test for checking physical limits and zeroes in NWP data #… #340

Conversation

glitch401 commented Jul 3, 2024 • edited Loading

Pull Request

Description

How Has This Been Tested?

Checklist:

glitch401 commented Jul 4, 2024

peterdudfield commented Jul 4, 2024

glitch401 commented Jul 11, 2024

peterdudfield commented Jul 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AUdaltsova commented Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glitch401 commented Jul 16, 2024

glitch401 commented Aug 8, 2024

glitch401 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glitch401 commented Aug 30, 2024

glitch401 commented Jul 3, 2024 •

edited

Loading

AUdaltsova commented Jul 12, 2024 •

edited

Loading