-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opening HDF5 files in Lua takes a long time #112
Comments
@d11 I would appreciate if you can give me a clue on what I should look into to fix this. Opening the HDF5 files is unbelievably slow. A Python script that does same exact thing opens all of my files within less than 0.1 of a second. |
@d11 Does this package have some assumption on how people make their data sets? I don't do any chunking or anything on the Python side when I'm creating the data set ... |
Hi, I'm not sure about this I'm afraid. This package does traverse the file when opening it, to determine the whole structure up front - perhaps even that is too slow in your case. It may be that the proper scalable thing is for this to be more lazy, but it was not necessary in our usage. In general you should know that torch-hdf5 is not as mature as h5py; while in principle HDF5 itself works fine for large datasets, this library has mainly been used for transferring smaller amounts of data between languages / programs in a convenient way. If you need to get around this I'd start by disabling the _loadObject call that is triggered when opening the file - it won't work without that but you can see if it becomes fast, which would confirm the idea that the traversal of the file is the cause of the slowness. To actually use the library without doing this up front, however, might require more invasive changes. |
@d11 Thank you for the information. Do you think I should comment these lines? |
Right, that would skip the initial traversal of the dataset that I mentioned. The library will not work without it, but it might at least confirm the cause of the problem. |
@d11 I think there is more to fix. Not only the file opening is super slow, but reading data is also very slow. Do you have an idea on why reading data might be too slow as well? |
I don't really have any guesses about that, sorry. Perhaps the dataspace used by torch-hdf5 is not suitable for your access patterns. https://support.hdfgroup.org/HDF5/doc/UG/HDF5_Users_Guide-Responsive%20HTML5/HDF5_Users_Guide/Dataspaces/HDF5_Dataspaces_and_Partial_I_O.htm?rhtocid=7.2#TOC_7_4_Dataspaces_and_Databc-6 describes how this can work in HDF5. If this is crucial for your performance you will probably need to use a lower level interface than torch-hdf5. |
I have created some HDF5 files using the
h5py
package in Python and load them in Lua using this package. The file sizes vary (from 10GBs to 100GBs or more) and opening them in Python is instantaneous. However, opening the same HDF5 files in Lua takes a very very long time using this package when I am calling hdf5.open(), before I even read any data. Sometimes it takes 1 minute or more to open one file even. I can open the same file in Python within less than half a second.I wonder if anyone has had this issue before?
The text was updated successfully, but these errors were encountered: