-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for zeros in the data #159
Comments
Yes, I would add it to the filter here: nwp-consumer/src/nwp_consumer/internal/service/consumer.py Lines 410 to 424 in 372359c
|
Hi, I'm someone who is just starting in the field of open source development. |
You can definately contirbute. Are you familiar with python and xarray? |
I'm familiar with python and common libraries like pandas, but not with xarray |
thanks good @GAMinsect, you migh need to learn a bit of xarray. Your welcome to give it ago. |
After 2 weeks of working on it in my free time, here's my implementation. def _dataQualityFilter(ds: xr.Dataset) -> bool:
"""Filter out data that is not of sufficient quality."""
if ds == xr.Dataset():
return False
zeroCount = 0
elementCount = 0
# Carry out a basic data quality check
for data_var in ds.data_vars:
if ds[f"{data_var}"].isnull().any():
log.warn(
event=f"Dataset has NaNs in variable {data_var}",
initTime=str(ds.coords["init_time"].values[0])[:16],
variable=data_var,
)
data = ds[data_var].data
elementCount += data.size
zeroCount += (data == 0).sum()
if zeroCount / elementCount > 0.2:
raise ValueError("In your dataset more than 20% of your data are 0's")
return True
|
@peterdudfield is the code fine? |
ill let @devsjc review if thats ok. |
@devsjc how's the review going? |
Hi @GAMinsect, thanks for your work looking into this, and apologies for the slow response. I've just merged in a quite comprehensive restructuring of the project in order to improve the speed of the application, which has resulted in the The code looks good though, so your work wasn't wasted - it will still be used in the logic, just not in the place I originally suggested! |
@devsjc Thank you for the reply (and sorry for my late one) i guess this issue can be closed now, right? |
Detailed Description
It would be great to have a check in place that checks for zeros. A large amount of these is normally an error
Context
Possible Implementation
The text was updated successfully, but these errors were encountered: