Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory backed files copy issue #1915

Open
eduardo-camargo opened this issue Jul 20, 2022 · 11 comments
Open

Memory backed files copy issue #1915

eduardo-camargo opened this issue Jul 20, 2022 · 11 comments
Assignees
Labels
Component - C Library Core C library issues (usually in the src directory) Confirmed Priority - 1. High 🔼 These are important issues that should be resolved in the next release Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub
Milestone

Comments

@eduardo-camargo
Copy link

I am having trouble to set up an image into a memory backed file. I’d like to set up a memory file, copy some data into it and then extract the a file image which will be set in another memory backed file (this last step seems to be the issue). The only way I found to make this work is to relaxing the version bounding to the earliest (H5F_LIBVER_EARLIEST).

Here it is the code that reproduces the issue. There is no buffer copy over network, everything is running on a single machine (Red Hat Enterprise Linux Server 7.9). File used to replicate the issue (file_image_core_test.h5) can be found on the HDF5-1.12.1 repo test folder. I would like to stress the below code works fine as is when using HDF5-1.8.17 but breaks on the second H5Fopen call when using HDF5-1.12.1. To make the code work again I need to change the first H5F_LIBVER_LATEST by H5F_LIBVER_EARLIEST on H5Pset_libver_bounds(..., H5F_LIBVER_EARLIEST, H5F_LIBVER_LATEST); which introduces a humongous slow down. The exact same issue happens on HDF5-1.12.2.

This bug report was originated from a discussion on HDF Forum. [(https://forum.hdfgroup.org/t/memory-backed-files/9878)]

#include "hdf5.h"
#include "H5LTpublic.h"

#define USE_HDF5_1_12_1 // uncomment here when using HDF5-1.12.1


herr_t copy_group( hid_t groupId, const char* name, H5L_info_t const* info, void* operatorData )
{
    if (!operatorData) {
        return -1;
    }
    hid_t targetFileId = *((hid_t*)operatorData);
    if (targetFileId <= 0) {
        return -1;
    }

    H5O_info_t objInfo;
    #ifdef  USE_HDF5_1_12_1
        H5Oget_info(groupId, &objInfo, H5O_INFO_ALL);
    #else
        H5Oget_info(groupId, &objInfo);
    #endif
    if (objInfo.type == H5O_TYPE_GROUP) {
        return H5Ocopy(groupId,name,targetFileId,name,H5P_DEFAULT,H5P_DEFAULT);
    }
    return 0; // do nothing if not a group
}



int main()
{
    std::string                 fName = "file_image_core_test.h5";
    size_t                      memory_increments = 10485760; //10 MB
    std::pair<hid_t,hid_t>      fapl_id_disk (-1,-1);
    std::pair<hid_t,hid_t>      fapl_id_memory1 (-1,-1);
    std::pair<hid_t,hid_t>      fapl_id_memory2 (-1,-1);
    unsigned char *             file_image_memory;
    size_t                      file_image_memory_size;
    unsigned char *             file_image_disk;
    size_t                      file_image_disk_size;
    herr_t                      ret;


    auto convert_h5_array = [&](hid_t fapl_id, unsigned char **buffer, size_t &buffer_size)
    {
        herr_t err = -1;
        *buffer = nullptr;
        buffer_size = 0;

        if(fapl_id < 0)
            return err;

        H5Fflush(fapl_id,H5F_SCOPE_GLOBAL);

        //getting a copy of the file_image
        int64_t sbyteCount = H5Fget_file_image(fapl_id, NULL,0);
        if(sbyteCount < 0)
            return err;

        size_t size       = (size_t)sbyteCount;
        unsigned char *file_image = (unsigned char *)malloc(size);
        if(file_image){
            err = H5Fget_file_image(fapl_id, file_image, size);
            if((size_t)err == size){
                *buffer = (unsigned char *)malloc(size);
                buffer_size = size;
                memcpy(*buffer, file_image,size);
                return err;
            }
        }
        free(file_image);
        return herr_t(-1);
    };


    auto close_h5 =[&](std::pair<hid_t,hid_t> &file_apl_id)
    {
        herr_t err = H5Pclose(file_apl_id.second);
        assert(err >= 0);
        file_apl_id.second = -1;

        err = H5Fclose(file_apl_id.first);
        assert(err >= 0);
        file_apl_id.first = -1;
    };


    // open a good hdf5 file from the disk
    fapl_id_disk.second = H5Pcreate(H5P_FILE_ACCESS);
    assert(fapl_id_disk.second >= 0);

    ret = H5Pset_fclose_degree(fapl_id_disk.second, H5F_CLOSE_STRONG); //needed for close file and all open objects
    assert(ret >= 0);

    fapl_id_disk.first = H5Fopen(fName.c_str(), H5F_ACC_RDONLY, fapl_id_disk.second);
    assert(fapl_id_disk.first >= 0);





    // create file memory backed file
    fapl_id_memory1.second = H5Pcreate(H5P_FILE_ACCESS);
    assert(fapl_id_memory1.second >= 0);

    ret = H5Pset_fclose_degree(fapl_id_memory1.second, H5F_CLOSE_STRONG); //needed for close file and all open objects
    assert(ret >= 0);

    ret = H5Pset_libver_bounds(fapl_id_memory1.second, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);
    assert(ret >= 0);

    /* Set up the core VFD */
    ret = H5Pset_fapl_core(fapl_id_memory1.second, memory_increments, false);
    assert(ret >= 0);

    fapl_id_memory1.first = H5Fcreate("dne.h5", H5F_ACC_TRUNC, H5P_DEFAULT, fapl_id_memory1.second);
    assert(fapl_id_memory1.first >= 0);






    // copy from disk file to memory file
    hid_t rootGroupId, memRootGroupId;
    rootGroupId = H5Gopen(fapl_id_disk.first,"/",H5P_DEFAULT);
    assert(rootGroupId > 0);
    memRootGroupId = H5Gopen(fapl_id_memory1.first,"/",H5P_DEFAULT);
    assert(memRootGroupId > 0);
    H5Literate(rootGroupId,H5_INDEX_NAME,H5_ITER_NATIVE,NULL,copy_group,(void*)&memRootGroupId);


    // ============= ASSUMPTION:that both file images (disk and memory) must be identical
    // Images are not identical regardless HDF5 version (used 1.8.17 and 1.12.1)
    // The second memory file (fapl_id_memory2) is created successfully ONLY when extracting from disk file (HDF5 1.12.1)
    convert_h5_array(fapl_id_memory1.first, &file_image_memory, file_image_memory_size);
    convert_h5_array(fapl_id_disk.first, &file_image_disk, file_image_disk_size);
    /* assert(file_image_disk_size == file_image_memory_size); */
    /* assert(0 == memcmp(file_image_disk, file_image_memory, file_image_disk_size)); */

    close_h5(fapl_id_disk);
    close_h5(fapl_id_memory1);



    /* Create the second memory backed file */
    fapl_id_memory2.second = H5Pcreate(H5P_FILE_ACCESS);
    assert(fapl_id_memory2.second >= 0);

    ret = H5Pset_fclose_degree(fapl_id_memory2.second, H5F_CLOSE_STRONG); //needed for close file and all open objects
    assert(ret >= 0);

    ret = H5Pset_libver_bounds(fapl_id_memory2.second, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);
    assert(ret >= 0);

    /* Set up the core VFD */
    ret = H5Pset_fapl_core(fapl_id_memory2.second, memory_increments, false);
    assert(ret >= 0);

    /* Set file image in plist */
    ret = H5Pset_file_image(fapl_id_memory2.second, file_image_memory, file_image_memory_size); // this doesn't work on HDF5-1.12.1 but works on 1.8.17, why??!!
    assert(ret >= 0);

    /* Test open with file image */
    fapl_id_memory2.first = H5Fopen("dne2.h5", H5F_ACC_RDONLY, fapl_id_memory2.second);
    assert(fapl_id_memory2.first >= 0);

    close_h5(fapl_id_memory2);


    /* Release resources */
    free(file_image_memory);
    free(file_image_disk);

    return 0;
}
@gheber
Copy link
Member

gheber commented Jul 22, 2022

Reference: Forum thread memory backed files

@eduardo-camargo
Copy link
Author

Hi @gheber. Hope everything is good on your side.
I was hoping to get some info on this issue. Do you have any timeline to fix this? I'm just trying to get an idea when this might be resolved so I can plan accordingly on my side.

@gheber gheber added the bug label Aug 25, 2022
@gheber
Copy link
Member

gheber commented Aug 25, 2022

I'm sorry @eduardo-camargo, no timeline yet. (Our engineers were busy getting 1.13.2 out the door.) I'll keep you posted.

@gheber gheber self-assigned this Aug 25, 2022
@derobins derobins removed the bug label Mar 3, 2023
@eduardo-camargo
Copy link
Author

@derobins and @gheber do you have any updates on this? Maybe the fix has been incorporated into one of the new versions? If so, could you point to a specific commit?

@gheber gheber added Component - C Library Core C library issues (usually in the src directory) Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub Confirmed labels Apr 26, 2023
@gheber gheber assigned derobins and unassigned gheber Apr 26, 2023
@eduardo-camargo
Copy link
Author

Hello @gheber and @derobins. I saw there was some movement here and would like to ask if there is any fix/workaround for this issue. Thanks.

@derobins
Copy link
Member

Hello @gheber and @derobins. I saw there was some movement here and would like to ask if there is any fix/workaround for this issue. Thanks.

We're trying to work through our backlog and tagging / assigning issues is a part of that, but we haven't made any specific movement on this bug yet. Good to know you are still interested, though! I'll see what I can do to get this into the fall releases.

@eduardo-camargo
Copy link
Author

Hi @derobins and @gheber just wanted to follow up on this issue to see if there are any updates

@derobins derobins added this to the 1.14.4 milestone Jan 19, 2024
@SoShiny
Copy link

SoShiny commented Feb 6, 2024

Hello, we encountered issues using in-memory hdf5 as well in the context of h5py, which might be caused by the same underlying issue. See the post about it on the hdf group forum. Looking forward to trying it out in 1.14.4.

@derobins derobins added the Priority - 1. High 🔼 These are important issues that should be resolved in the next release label Mar 15, 2024
@SoShiny
Copy link

SoShiny commented Apr 16, 2024

With the first releases for 1.14.4 trickling, what are the chances that his issue will be resolved in the near future?

@derobins derobins modified the milestones: 1.14.4, 1.14.5 Apr 18, 2024
@mpijensen
Copy link

This issue is also holding us back from upgrading to 1.14 so I am glad to see that the priority is set to High. Hope the fix will make the next release (1.14.5).

@eduardo-camargo
Copy link
Author

The same here. We've been trying to upgrade to 1.14 but this issue is holding us back.

@derobins derobins modified the milestones: 1.14.5, 2.0.0 Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - C Library Core C library issues (usually in the src directory) Confirmed Priority - 1. High 🔼 These are important issues that should be resolved in the next release Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub
Projects
None yet
Development

No branches or pull requests

5 participants