returning 'nan' when too many variables requested? #620

derekpickell · 2024-10-14T23:13:45Z

Hi there,

I'm playing around with a basic read of locally downloaded .h5 files:

reader = ipx.Read(data_source=file_list)
reader.vars.append(beam_list=['gt2l', 'gt2r'], var_list=['h_li', "latitude", "longitude", "h_li_sigma", "sigma_geo_h", "bsnow_h", "cloud_flg_asr", "atl06_quality_summary"])
 ds = reader.load()

It seems when I add just one more variable to the list, e.g., 'h_rms_misfit', the number of 'nans' in the returned 'ds' xarray increases for no apparent reason, sometimes for all variables.

icepyx v1.3.0

Thank you!

The text was updated successfully, but these errors were encountered:

JessicaS11 · 2024-10-18T17:13:29Z

Hello @derekpickell! I just wanted to acknowledge I saw this post (thanks for reaching out!) and am wondering why this is unexpected behavior? It could be that adding h_rms_misfit is increasing one of the dataset dimensions, which would tend to increase the number of nans as Xarray pads out the data to take this new shape.

A few questions that will make it easier for me to diagnose if there's an issue:

Does the behavior seem to be tied to the h_rms_misfit variable specifically, or any number of variables >8?
How are you counting the number of nans?
What data product are you using?
Can you share either the search you used to download the granules or some of the granule IDs where you're noticing this happening?

derekpickell · 2024-10-18T17:31:49Z

Hi @JessicaS11,

Thank you for the response! To answer your questions:

as far as I can tell, it’s any number of variables > 8. I experimented with cloud and blowing snow flags as well.
To count nans, I search the field using np.nanmax() and nanmin(), and sometimes no number is returned, indicating the field is all nan, so I suspect this isn’t a padding issue since these data are returned with fewer variables.
I am using ATL06, manually downloaded from the NSCIDC data access tool using a 30km bounding box centered near Summit, GL from 2018 to present.

JessicaS11 · 2024-10-22T19:19:26Z

Thanks for these answers. I've dug in a bit more and now suspect that it is not the number of variables you're playing with, but which variables. The note on which ones you've experimented with was a clue. h_rms_misfit, bsnow_h, and cloud_flg_asr are all more deeply nested variables than (for instance) h_li (if you look at the variable paths, they have either geophysical or fit_statistics after the land_ice_segments layer. If you look at the resulting dataset for a single file after reading in two versus three of the above specific variables, the coordinates attached to the variable are different. What's happening behind the scenes is essentially icepyx is doing all of the individual group reads with xarray and then trying to cleverly merge the per-group dataarrays together into one dataset. As you've noted, this doesn't always work! Handling (generically) the multiple layers of nesting is an ongoing challenge in icepyx, so thanks for reporting this case we missed.

I think I've isolated where in the code the issue is happening (lines 816-822 or so in the read module, so could also be in one of the functions called therein), but I haven't yet figured out what the solution might be (any suggestions welcome!). I'll continue to work on resolving this as time allows, but any assistance would be greatly appreciated.

JessicaS11 · 2024-10-25T14:02:21Z

Hello @derekpickell! I have good news and bad news. Good news is the bug I identified where all dimensions were not being applied to the deeper nested variables of interest is fixed via #623. Bad news is I don't think this was actually the problem you noted.

When I dug in further, I found a granule that only has nan values for some variables. However, it seems like only bsnow_h fits into this category, not cloud_flg_asr or h_rms_misfit. If I'm not mistaken, in some situations the blowing snow algorithm is unable to confidently quantify blowing snow, which would result in no blowing snow values. @mikala-nsidc (ICESat-2 support specialist at NSIDC) or @tsutterley (one of the ATL06 product leads), can you confirm that in some cases no bsnow_h (and thus all nans) is expected behavior for ATL06 granules?

derekpickell · 2024-10-25T18:41:18Z

@JessicaS11 wow amazing thank you. It looks like everything 'makes sense' with the data I am looking at: few nans here and there, but no large gaps where I wouldn't expect them.

JessicaS11 · 2024-10-29T14:19:00Z

@derekpickell Excellent! I'm going to close this issue as resolved, but feel free to comment again if need be. Would you be able/willing to do a PR review for #623?

JessicaS11 mentioned this issue Oct 25, 2024

properly apply all dimensions to deeply nested variables #623

Merged

JessicaS11 closed this as completed Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

returning 'nan' when too many variables requested? #620

returning 'nan' when too many variables requested? #620

derekpickell commented Oct 14, 2024

JessicaS11 commented Oct 18, 2024

derekpickell commented Oct 18, 2024

JessicaS11 commented Oct 22, 2024

JessicaS11 commented Oct 25, 2024

derekpickell commented Oct 25, 2024

JessicaS11 commented Oct 29, 2024

returning 'nan' when too many variables requested? #620

returning 'nan' when too many variables requested? #620

Comments

derekpickell commented Oct 14, 2024

JessicaS11 commented Oct 18, 2024

derekpickell commented Oct 18, 2024

JessicaS11 commented Oct 22, 2024

JessicaS11 commented Oct 25, 2024

derekpickell commented Oct 25, 2024

JessicaS11 commented Oct 29, 2024