-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetCDF and N-Dimensional Variable Support #26
Conversation
This is awesome @jpswinski! I played around a bit with reading 2D files trying out an ATL23 product, an ATL20 product, and a netcdf of ocean temperature. Both ATL products read well. In both those products the coordinate variables didn't read in, but that was because the coordinates were listed in a group above the group being read (coordinates had the path With the netcdf file I did get an error This is an absolutely amazing update!! 🎉 |
Summary
This PR provides a number of improvements to h5coro:
Multi-Processing Mode
Experimental support has been added for running h5coro in multi-process mode as opposed to multi-threaded mode which is the default. When enabled for multiprocessing (by specifying the multiProcess=True parameter in the constructor for the h5coro object), decoding chunk data for each variable occurs in a separate process instead of a separate thread. On systems with multiple cores, this makes a significant difference in performance. For example, on an 8 core system, opening an xarray on the ICESat-2 heights group took ~200 seconds in multi-threaded mode, and 30 seconds in multi-processing mode.
There are a couple of caveats:
Column Conversions
The xarray backend supports running every value in a column through a conversion function, but when profiling the code, this process took an extremely long time. It is used to convert the ICESat-2 delta times into datetimes; but this has been defaulted to off in this PR.