-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow reading of mhydro file #749
Comments
Initial profiling has revealed that ~100% of the time is spent in mikeio/mikeio/pfs/_pfsdocument.py Lines 361 to 364 in 299e509
@jsmariegaard the name of the private method There are some really long lines in the example file, one line contains a line like this:
with 25799 commas inside it, I don't know if these are the ones that takes time, but at least it contains many commas. |
I wonder if we could have special handling of a |
To try this out I removed the 11 lines with MULTIPOLYGON grep -v MULTIPOLYGON file_name.mhydro > stripped.mhydro Parsing the original pfs file took ~10 minutes, the stripped file with 8992 lines (11 lines shorter) took 0.3 seconds to read🤯. |
Describe the bug
I have three different mhydro files, of sizes 2-5 MB. When reading them with
mikeio.read_pfs
, one takes 2 minutes to read, one takes 10 minutes and the last takes 20 minutes to read, and I can't understand why it takes so long. Are there any catchment polygons or other data that are unusually complicated to parse? After the files are read into memory, all reading and changing of values goes quickly. Writing the file again is done in no time.To Reproduce
Steps to reproduce the behavior:
Load the relevant file, sent by email to JEM and JAN:
mhr_file = mikeio.read_pfs("river_name.mhydro")
I don't think it matters if I use IPython, Jupyter Notebook or plain Python.
Expected behavior
The file would be loaded into memory in some seconds.
System information:
The text was updated successfully, but these errors were encountered: