Skip to content

Latest commit

 

History

History

h5read

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

h5read

Designed to be a common utility c and library for reading nexus hdf5 files from miniapp implementations. This is intended to help remove differences between parsing of image files between implementations.

There are both C and C++ API interfaces to the library.

Usage

Add this directory as a subdirectory in your CMakeLists.txt:

add_subdirectory(../h5read h5read)

And then add as a dependency to your built targets:

target_link_libraries(${target_name} PUBLIC h5read)

All manipulation and data access happens via an opaque h5read_handle object. The easiest way to create an h5read_handle is to use the provided argument parser:

h5read_handle *obj = h5read_parse_standard_args(argc, argv);

This will parse any argument for your program as:

Usage: your_program [-h|--help] [-v] [FILE.nxs | --sample]


Options:
  FILE.nxs      Path to the Nexus file to parse
  -h, --help    Show this message
  -v            Verbose HDF5 message output
  --sample      Don't load a data file, instead use generated test data

Generated Sample Data

Currently, the sample data generated by passing --sample or calling h5read_generate_samples() are Eiger 2XE 16Mp data, with 1028x512px modules, 12x38px gaps and a total image size of 4363x4148px. There is a mask present which masks off the module gap but otherwise is empty.

The intention is to provide a baseline of simple, known images to do validation from.

Index Description
0 Completely empty image. This means 16842752 valid, empty pixels.
1 I=1 for every unmasked pixel
2 Single pixels of I=100, every 42 pixels in a grid, for 10296 total. Of these pixels, 9604 are not masked.
3 "Random" background between 0 and 3 intensity, and zero under the masks. This is not a true random.

Reference - C API

Handle Creation

h5read_handle *h5read_open(const char *master_filename)

Open a Nexus file, and return an opaque h5read_handle pointer. This must be released by calling h5read_free when it is no longer required. If the function cannot open a root nexus file, it will return NULL.

If the function can open the base file but encounters an error reading the child files or datasets (including unexpected data shapes), then it will print a message to stderr and exit(1). These error cases may be changed to a return of NULL in the future.

This function is somewhat limited in the Nexus files that it will accept - it will try to accept Eiger 2XE 4M and 16M data, but can not currently accept other shaped detectors.


h5read_handle *h5read_generate_samples();

Doesn't open a Nexus file, but instead return an h5read_handle that accesses a set of generated sample data, as described in Generated Sample Data of data. This also needs to be released by calling h5read_free when it is no longer required.


h5read_handle *h5read_parse_standard_args(int argc, char **argv)

Parse an arc/argv pair of command line arguments. This will accept a filename, or a request to use sample data with --sample. If there is an error reading the nexus file, then this will call exit(1), so the returned handle from this function will always be valid.

If the environment variable H5READ_IMPLICIT_SAMPLE is set, then if you do not pass any arguments --sample will be assumed.


h5read_free(h5read_handle *)

Frees a previously constructed handle object. It is an error to release these resources without first releasing all image data - the image objects may hold references to data held in the master object.


Image Information

size_t h5read_get_number_of_images(h5read_handle *obj);

Get the number of images in a particular dataset


size_t h5read_get_image_slow(h5read_handle *obj);

Get the number of image pixels in the slow dimension


size_t h5read_get_image_fast(h5read_handle *obj);

Get the number of image pixels in the fast dimension


Image Data

Image Data is represented in the form of a struct:

typedef struct image_t {
    uint16_t *data;
    uint8_t *mask;
    size_t slow;
    size_t fast;
} image_t;

Where slow and fast are the image dimensions, in pixels, and data and mask are pointers to 2D arrays of image data. For convenience, image_t_type is defined in h5read.h to point to the data type used for image data.

You can retrieve an image struct for a particular image with:

image_t *h5read_get_image(h5read_handle *obj, size_t frame_number);

If the library cannot read the image, it will print an error message and call exit(1).

When you are finished with the image, you can release it by calling:

void h5read_free_image(image_t *image);

The above h5read_get_image allocates a buffer for you. If you then need to copy the image data somewhere else, then this is inefficient. For this reason, there is an additional API method to get image data:

void h5read_get_image_into(h5read_handle *obj, size_t index, image_t_type *data);

Read an image from a dataset into a preallocated buffer. The caller is responsible for both allocating and releasing the image data buffer. This buffer must be at least large enough to hold an image of slow*fast, or else undefined memory could be overwritten. To get the mask data, you can call:

uint8_t *h5read_get_mask(h5read_handle *obj);

Borrows a pointer to the internal (shared) mask data. This is a common mask defined at the file level and shared between all images. You must not release this memory, and must not use it beyond calling h5read_free on the h5read handle object.

Image Modules Data

For convenience, you can also access image data in the form of single modules. Modules are represented by the image_modules_t struct:

typedef struct image_modules_t {
    uint16_t *data;  ///< Module image data; 3D array of [module][slow][fast]
    uint8_t *mask;   ///< Image mask, in the same shape as the module data
    size_t modules;  ///< Total number of modules
    size_t slow;     ///< Number of pixels in slow direction per module
    size_t fast;     ///< Number of pixels in fast direction per module
} image_modules_t;

This can be retrieved with:

image_modules_t *h5read_get_image_modules(h5read_handle *obj, size_t frame_number);

Like h5read_get_image, this function will call exit(1) with an error message if it fails to load the image data.

This image_modules_t object should be released after usage by calling:

void h5read_free_image_modules(image_modules_t *modules);

Reference - C++ API

Alongside the C api, there is also C++ API in #include "h5read.h". This mostly takes the same form, but takes care of memory management for you.

This API makes use of the C++20 std::span object. If you are compiling on an earlier standard or an implementation without the <span> header, then a backport implementation is used, tcb::span. If you include the h5read.h header, then the macro SPAN is bound to either std::span or tcb::span, whichever is available. (This is controlled with the USE_SPAN_BACKPORT compiler definition, which happens automatically if using the CMake submodule).

Creating Reader Objects

Instead of creating an handle pointer, You create an H5Read class. This has three constructor forms:

H5Read()

Constructs the reader with sample data, as via h5read_generate_samples.

H5Read(const std::string &filename)

Constructs a reader from a physical Nexus file. Any way that the h5read_open could fail by returning a null pointer, this will raise an std::runtime_error.

H5Read(int argc, char **argv);

Construct a reader by interpreting command-line arguments, the same as h5read_parse_standard_args.

Once you have an H5Read object, you can retrieve information via:

span<uint8_t> get_mask();                       // Get the central mask data
size_t get_number_of_images();     // Get the number of frames in the reader
size_t get_image_slow();   // Get the number of pixels in the slow dimension
size_t get_image_fast();   // Get the number of pixels in the fast dimension
std::array<size_t, 2> image_shape(); // Get the image shape, in (slow, fast)

Image Data

To access an image, you can use:

Image H5Read::get_image(size_t index)

This will return an Image object. Much like the image_t struct, this contains members pointing to the various data:

struct Image {
    const span<image_t_type> data; // Pointer to image data
    const span<uint8_t> mask;      // Pointer to mask data
    const size_t slow;                  // Number of y (slow) pixels
    const size_t fast;                  // Number of x (fast) pixels
}

Image Modules Data

To access an image in the form of separate modules, you can use:

ImageModules H5Read::get_image_modules(size_t index)

This returns an object with image data in the form of a modules array:

struct ImageModules {
    const span<image_t_type> data;
    const span<uint8_t> mask;

    const size_t n_modules;  // Number of modules
    const size_t slow;       // Height of a module, in pixels
    const size_t fast;       // Width of a module, in pixels

    const span<span<image_t_type>> modules;
    const span<span<uint8_t>> masks;
}

.data and .mask are the same as image_modules_t - a pointer to the entire array of data for all modules.

In addition, for convenience, there are the .modules and .masks lookup arrays - these are arrays that point to each module separately, so to e.g. sum up all the pixels (without checking the mask) in the fourth module:

size_t sum = 0;
for (auto i : modules.modules[4]) {
    sum += i;
}

Unlike the C api, it is safe to keep these objects around longer than the main H5Read object.