[Request] can dwalk "stream" text output as it reads mfu file (to avoid high RAM usge)? #563

markmoe19 · 2023-10-16T19:06:54Z

I have a large 1.4TB .mfu file generated by dwalk for 502M items.
I want to generate an unsorted text output file from this mfu file.

Does dwalk read the entire mfu file into RAM before outputting the text file?
For sorted output, I could see reading into all into RAM. But for unsorted output, could dwalk “stream” the output as it reads the mfu input and thereby not use much ram?

I’m asking because I have a service node that can generate the generate the mfu file but doesn’t have enough ram to generate (unsorted) text output from that same mfu file.

Thanks!

Mark

adammoody · 2023-10-16T19:37:31Z

Good question. At the moment dwalk and the other tools can only read the entire file at once. We'd have to hack together a tool for the streaming bit.

The code that reads the .mfu file (file format version 4) is here:

mpifileutils/src/common/mfu_flist_io.c

Line 844 in fac50e0

static void read_cache_v4(

The loop that unpacks an encoded file entry read from the .mfu file is here:

mpifileutils/src/common/mfu_flist_io.c

Lines 1060 to 1065 in fac50e0

    
           while (packcount < (uint64_t) read_count) { 
        
               /* unpack item from buffer and advance pointer */ 
        
               list_insert_ptr(flist, ptr, 1, chars); 
        
               ptr += elem_size; 
        
               packcount++; 
        
           }

The biggest change is that we couldn't unpack the entries into an mfu_flist like this function does, since the flist structure expects to have the full list loaded in memory at once. However, one could look to modify the unpack function to just print the file name instead of inserting the element into the list.

The list_insert_ptr() function unpacks each element and adds it to the list:

mpifileutils/src/common/mfu_flist_io.c

Line 332 in fac50e0

    
           static size_t list_insert_ptr(flist_t* flist, char* ptr, int detail, uint64_t chars)

Most of the heavy lifting in parsing the data for each file is in list_elem_unpack, which shows how the fields in the element are set:

mpifileutils/src/common/mfu_flist_io.c

Line 275 in fac50e0

    
           static size_t list_elem_unpack(const void* buf, int detail, uint64_t chars, elem_t* elem)

You could perhaps just cut-paste-edit list_insert_ptr function to have a print_ptr version that allocates, unpacks, prints the file name, and frees the element, something like:

static size_t print_ptr(char* ptr, int detail, uint64_t chars)
{
    elem_t* elem = (elem_t*) MFU_MALLOC(sizeof(elem_t));
    size_t bytes = list_elem_unpack(ptr, detail, chars, elem);
    printf("%s\n", elem->file);
    mfu_free(&elem->file);
    mfu_free(&elem);
    return bytes;
}

It would be cleaner still to avoid allocating and freeing the element each time.

adammoody · 2023-10-17T19:09:16Z

Actually, after reviewing the code, you may be able to read this file back on the same node using dwalk.

The current v4 of the .mfu format stores file names as fixed length fields, where every file name is padded to the longest file name in the set.

mpifileutils/src/common/mfu_flist_io.c

Lines 248 to 251 in fac50e0

    
           /* copy in file name */ 
        
           char* file = elem->file; 
        
           strncpy(ptr, file, chars); 
        
           ptr += chars;

That decision makes it easy to seek to a specific entry in the .mfu file, but it also significantly inflates the .mfu file size if there are many files and one really long filename:

disk space ~ (numfiles * max(filename length))

When reading the file entries back from the .mfu file, we copy each file name using strdup:

mpifileutils/src/common/mfu_flist_io.c

Lines 284 to 285 in fac50e0

    
           /* copy path */ 
        
           elem->file = MFU_STRDUP(file);

The strdup will drop all of that extra padding, so the same list when read back into memory will take less space than when stored on disk.

If you generated this list from a single node, you might be able to read it back on that single node.

With the eventual v5 .mfu file format, whenever that comes, I hope we can store file names using variable length structures to avoid this problem. That will likely require the addition of an index to support efficient seeks.

adammoody · 2023-10-17T19:34:03Z

If you find that you can't use dwalk, let me know, and I can hack up a branch with a tool to get you started.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request] can dwalk "stream" text output as it reads mfu file (to avoid high RAM usge)? #563

[Request] can dwalk "stream" text output as it reads mfu file (to avoid high RAM usge)? #563

markmoe19 commented Oct 16, 2023

adammoody commented Oct 16, 2023 •

edited

Loading

adammoody commented Oct 17, 2023 •

edited

Loading

adammoody commented Oct 17, 2023

[Request] can dwalk "stream" text output as it reads mfu file (to avoid high RAM usge)? #563

[Request] can dwalk "stream" text output as it reads mfu file (to avoid high RAM usge)? #563

Comments

markmoe19 commented Oct 16, 2023

adammoody commented Oct 16, 2023 • edited Loading

adammoody commented Oct 17, 2023 • edited Loading

adammoody commented Oct 17, 2023

adammoody commented Oct 16, 2023 •

edited

Loading

adammoody commented Oct 17, 2023 •

edited

Loading