Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check for zeros in loader #335

Open
peterdudfield opened this issue Jun 27, 2024 · 19 comments
Open

check for zeros in loader #335

peterdudfield opened this issue Jun 27, 2024 · 19 comments
Labels
good first issue Good for newcomers

Comments

@peterdudfield
Copy link
Contributor

Detailed Description

Check for zeros when loading the data. This should only be used when running inference

Context

  • good to check for lots of zeros and fail hard

Possible Implementation

@glitch401
Copy link

I'll take this up as well, I think it's correlated to #337

@glitch401
Copy link

Apologies if this sounds trivial, but about the implementation, are we planning to log it or raise a ValueError?

@peterdudfield
Copy link
Contributor Author

Raise an error please

@glitch401
Copy link

I was trying to trigger positive and negative test cases, by cloning and replacing nwp data. but there seems to be something at large with the data that i'm missing to figure out.

Reference:

  • I've created a new test data for testing zeroes using:
for i in store:
   zarr.copy(store[i], store_clone, name=i)
store_clone['UKV'][0, 0, 0]=0  
  • while reading this new dataset I'm getting a key error:
KeyError: ".zmetadata\nThis exception is thrown by __iter__ of OpenNWPIterDataPipe(check_for_zeros=False, check_physical_limits=False, zarr_path='tests/data/nwp_data/test_with_zeros.zarr')"

can you help me point to the right direction, please?

@peterdudfield
Copy link
Contributor Author

That error can sometimes happen if the file you load isnt a zarr. Can you check it manually that is had the .zmetadata in tests/data/nwp_data/test_with_zeros.zarr

@glitch401
Copy link

Yep, that worked :D
Apparently the way I was coloning the zarr, didnt carry over the metadata at .zmetadata

Also, in reference to #337, can you help me understand the physical limits for nwp.variable UKV?

@glitch401
Copy link

to solve the above, I've manually copied .zmetadata and .zarray
which works as i'm manually testing the test functions at test\load\nwp\test_load_nwp.py:

def test_check_for_zeros():
    #positive test case
    nwp_datapipe = OpenNWP(
        zarr_path="tests/data/nwp_data/test_with_zeros_n_limits.zarrr",
        check_for_zeros=True)
    with pytest.raises(ValueError): # checks for Error raised if NWP DataArray contains zeros
        metadata = next(iter(nwp_datapipe))

    #negative test case
    nwp_datapipe = OpenNWP(
        zarr_path="tests/data/nwp_data/test.zarr",
        check_for_zeros=True)
    metadata = next(iter(nwp_datapipe))
    assert metadata is not None

there is something odd happening as I'm trying to run pytest on ocf_datapipeline. the files get automatically deleted, which ends up throwing the same error :/

KeyError: ".zmetadata\nThis exception is thrown by __iter__ of OpenNWPIterDataPipe(check_for_zeros=False, check_physical_limits=False, zarr_path='tests/data/nwp_data/test_with_zeros.zarr')"

@peterdudfield
Copy link
Contributor Author

peterdudfield commented Jul 3, 2024

Which files get delete?

Also do you mean zarr_path="tests/data/nwp_data/test_with_zeros_n_limits.zarr" instead of zarr_path="tests/data/nwp_data/test_with_zeros_n_limits.zarrr"?

@glitch401
Copy link

tests/data/nwp_data/test_with_zeros_n_limits.zarr/.zmetadata gets deleted as I'm running pytest

oh yes, my bad, it was a silly typo

@peterdudfield
Copy link
Contributor Author

did the typo fix it? Or is it still getting deleted?

@peterdudfield
Copy link
Contributor Author

did the typo fix it? Or is it still getting deleted?

Can you work out where the file gets deleted?

@glitch401
Copy link

The typo dosent exist in the real code.
I'm trying to figure the same. There seeem to be some correlation with pytest

@peterdudfield
Copy link
Contributor Author

How odd, can you push code to a draft PR and I might be able to have a look at it?

@glitch401
Copy link

sure thing I'll do that

@glitch401
Copy link

If it helps, I've used this code to produce test_with_zeros_n_limits.zarr:

import zarr
import shutil
import numpy as np

original_store_path = 'tests/data/nwp_data/test.zarr'
original_store = zarr.open(original_store_path, mode='r')

new_store_path = 'tests/data/nwp_data/test_with_zeros_n_limits.zarr'
shutil.rmtree(new_store_path, ignore_errors=True)
with zarr.open(new_store_path, mode='w') as new_store:
    for item in original_store:
        zarr.copy(original_store[item], new_store, name=item)
    
    new_store['UKV'][0, 0, 0,0]=0
    new_store['UKV'][0, 0, 0,1]=np.random.uniform(240, 350, size=(548,))

@peterdudfield
Copy link
Contributor Author

This script doesn't get run with pytest does it?

It should just be run once yea?

@glitch401
Copy link

The script is supposed to run once, to generate the data. Do you think we'd be better off if it's run as part of test_load_nwp.py?

@peterdudfield
Copy link
Contributor Author

Perhaps its worth a try. And that would mean less test data is committed to the repo

@glitch401
Copy link

alrighty, I'll make the necessary changes to the draft PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants