-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Request] can dwalk "stream" text output as it reads mfu file (to avoid high RAM usge)? #563
Comments
Good question. At the moment dwalk and the other tools can only read the entire file at once. We'd have to hack together a tool for the streaming bit. The code that reads the .mfu file (file format version 4) is here: mpifileutils/src/common/mfu_flist_io.c Line 844 in fac50e0
The loop that unpacks an encoded file entry read from the mpifileutils/src/common/mfu_flist_io.c Lines 1060 to 1065 in fac50e0
The biggest change is that we couldn't unpack the entries into an mfu_flist like this function does, since the flist structure expects to have the full list loaded in memory at once. However, one could look to modify the unpack function to just print the file name instead of inserting the element into the list. The mpifileutils/src/common/mfu_flist_io.c Line 332 in fac50e0
Most of the heavy lifting in parsing the data for each file is in mpifileutils/src/common/mfu_flist_io.c Line 275 in fac50e0
You could perhaps just cut-paste-edit
It would be cleaner still to avoid allocating and freeing the element each time. |
Actually, after reviewing the code, you may be able to read this file back on the same node using The current v4 of the mpifileutils/src/common/mfu_flist_io.c Lines 248 to 251 in fac50e0
That decision makes it easy to seek to a specific entry in the disk space ~ (numfiles * max(filename length)) When reading the file entries back from the mpifileutils/src/common/mfu_flist_io.c Lines 284 to 285 in fac50e0
The If you generated this list from a single node, you might be able to read it back on that single node. With the eventual v5 |
If you find that you can't use dwalk, let me know, and I can hack up a branch with a tool to get you started. |
I have a large 1.4TB .mfu file generated by dwalk for 502M items.
I want to generate an unsorted text output file from this mfu file.
Does dwalk read the entire mfu file into RAM before outputting the text file?
For sorted output, I could see reading into all into RAM. But for unsorted output, could dwalk “stream” the output as it reads the mfu input and thereby not use much ram?
I’m asking because I have a service node that can generate the generate the mfu file but doesn’t have enough ram to generate (unsorted) text output from that same mfu file.
Thanks!
The text was updated successfully, but these errors were encountered: