Designed to be a common utility c and library for reading nexus hdf5 files from miniapp implementations. This is intended to help remove differences between parsing of image files between implementations.
There are both C and C++ API interfaces to the library.
Add this directory as a subdirectory in your CMakeLists.txt
:
add_subdirectory(../h5read h5read)
And then add as a dependency to your built targets:
target_link_libraries(${target_name} PUBLIC h5read)
All manipulation and data access happens via an opaque h5read_handle
object.
The easiest way to create an h5read_handle
is to use the provided argument
parser:
h5read_handle *obj = h5read_parse_standard_args(argc, argv);
This will parse any argument for your program as:
Usage: your_program [-h|--help] [-v] [FILE.nxs | --sample]
Options:
FILE.nxs Path to the Nexus file to parse
-h, --help Show this message
-v Verbose HDF5 message output
--sample Don't load a data file, instead use generated test data
Currently, the sample data generated by passing --sample
or calling
h5read_generate_samples()
are Eiger 2XE 16Mp data, with 1028x512px modules,
12x38px gaps and a total image size of 4363x4148px. There is a mask present
which masks off the module gap but otherwise is empty.
The intention is to provide a baseline of simple, known images to do validation from.
Index | Description |
---|---|
0 | Completely empty image. This means 16842752 valid, empty pixels. |
1 | I=1 for every unmasked pixel |
2 | Single pixels of I=100, every 42 pixels in a grid, for 10296 total. Of these pixels, 9604 are not masked. |
3 | "Random" background between 0 and 3 intensity, and zero under the masks. This is not a true random. |
h5read_handle *h5read_open(const char *master_filename)
Open a Nexus file, and return an opaque h5read_handle
pointer. This must be
released by calling h5read_free
when it is no longer required. If the
function cannot open a root nexus file, it will return NULL
.
If the function can open the base file but encounters an error reading the
child files or datasets (including unexpected data shapes), then it will print
a message to stderr and exit(1)
. These error cases may be changed to a return
of NULL
in the future.
This function is somewhat limited in the Nexus files that it will accept - it will try to accept Eiger 2XE 4M and 16M data, but can not currently accept other shaped detectors.
h5read_handle *h5read_generate_samples();
Doesn't open a Nexus file, but instead return an h5read_handle
that accesses
a set of generated sample data, as described in Generated Sample
Data of data. This also needs to be released by
calling h5read_free
when it is no longer required.
h5read_handle *h5read_parse_standard_args(int argc, char **argv)
Parse an arc/argv pair of command line arguments. This will accept a filename,
or a request to use sample data with --sample
. If there is an error reading
the nexus file, then this will call exit(1)
, so the returned handle from this
function will always be valid.
If the environment variable H5READ_IMPLICIT_SAMPLE
is set, then if you do
not pass any arguments --sample
will be assumed.
h5read_free(h5read_handle *)
Frees a previously constructed handle object. It is an error to release these resources without first releasing all image data - the image objects may hold references to data held in the master object.
size_t h5read_get_number_of_images(h5read_handle *obj);
Get the number of images in a particular dataset
size_t h5read_get_image_slow(h5read_handle *obj);
Get the number of image pixels in the slow dimension
size_t h5read_get_image_fast(h5read_handle *obj);
Get the number of image pixels in the fast dimension
Image Data is represented in the form of a struct:
typedef struct image_t {
uint16_t *data;
uint8_t *mask;
size_t slow;
size_t fast;
} image_t;
Where slow
and fast
are the image dimensions, in pixels, and data
and mask
are pointers to 2D arrays of image data. For convenience,
image_t_type
is defined in h5read.h
to point to the data type used for
image data.
You can retrieve an image struct for a particular image with:
image_t *h5read_get_image(h5read_handle *obj, size_t frame_number);
If the library cannot read the image, it will print an error message and
call exit(1)
.
When you are finished with the image, you can release it by calling:
void h5read_free_image(image_t *image);
The above h5read_get_image
allocates a buffer for you. If you then need to
copy the image data somewhere else, then this is inefficient. For this reason,
there is an additional API method to get image data:
void h5read_get_image_into(h5read_handle *obj, size_t index, image_t_type *data);
Read an image from a dataset into a preallocated buffer. The caller is responsible for both allocating and releasing the image data buffer. This buffer must be at least large enough to hold an image of slow*fast, or else undefined memory could be overwritten. To get the mask data, you can call:
uint8_t *h5read_get_mask(h5read_handle *obj);
Borrows a pointer to the internal (shared) mask data. This is a common mask
defined at the file level and shared between all images. You must not release
this memory, and must not use it beyond calling h5read_free
on the h5read
handle object.
For convenience, you can also access image data in the form of single modules.
Modules are represented by the image_modules_t
struct:
typedef struct image_modules_t {
uint16_t *data; ///< Module image data; 3D array of [module][slow][fast]
uint8_t *mask; ///< Image mask, in the same shape as the module data
size_t modules; ///< Total number of modules
size_t slow; ///< Number of pixels in slow direction per module
size_t fast; ///< Number of pixels in fast direction per module
} image_modules_t;
This can be retrieved with:
image_modules_t *h5read_get_image_modules(h5read_handle *obj, size_t frame_number);
Like h5read_get_image
, this function will call exit(1)
with an error
message if it fails to load the image data.
This image_modules_t
object should be released after usage by calling:
void h5read_free_image_modules(image_modules_t *modules);
Alongside the C api, there is also C++ API in #include "h5read.h"
. This
mostly takes the same form, but takes care of memory management for you.
This API makes use of the C++20 std::span
object. If you are compiling on
an earlier standard or an implementation without the <span>
header, then a
backport implementation is used, tcb::span
. If you include the h5read.h
header, then the macro SPAN
is bound to either std::span
or tcb::span
,
whichever is available. (This is controlled with the USE_SPAN_BACKPORT
compiler definition, which happens automatically if using the CMake submodule).
Instead of creating an handle pointer, You create an H5Read
class. This has
three constructor forms:
H5Read()
Constructs the reader with sample data, as via h5read_generate_samples
.
H5Read(const std::string &filename)
Constructs a reader from a physical Nexus file. Any way that the h5read_open
could fail by returning a null pointer, this will raise an
std::runtime_error
.
H5Read(int argc, char **argv);
Construct a reader by interpreting command-line arguments, the same as
h5read_parse_standard_args
.
Once you have an H5Read
object, you can retrieve information via:
span<uint8_t> get_mask(); // Get the central mask data
size_t get_number_of_images(); // Get the number of frames in the reader
size_t get_image_slow(); // Get the number of pixels in the slow dimension
size_t get_image_fast(); // Get the number of pixels in the fast dimension
std::array<size_t, 2> image_shape(); // Get the image shape, in (slow, fast)
To access an image, you can use:
Image H5Read::get_image(size_t index)
This will return an Image
object. Much like the image_t
struct, this contains
members pointing to the various data:
struct Image {
const span<image_t_type> data; // Pointer to image data
const span<uint8_t> mask; // Pointer to mask data
const size_t slow; // Number of y (slow) pixels
const size_t fast; // Number of x (fast) pixels
}
To access an image in the form of separate modules, you can use:
ImageModules H5Read::get_image_modules(size_t index)
This returns an object with image data in the form of a modules array:
struct ImageModules {
const span<image_t_type> data;
const span<uint8_t> mask;
const size_t n_modules; // Number of modules
const size_t slow; // Height of a module, in pixels
const size_t fast; // Width of a module, in pixels
const span<span<image_t_type>> modules;
const span<span<uint8_t>> masks;
}
.data
and .mask
are the same as image_modules_t
- a pointer to the entire
array of data for all modules.
In addition, for convenience, there are the .modules
and .masks
lookup
arrays - these are arrays that point to each module separately, so to e.g. sum
up all the pixels (without checking the mask) in the fourth module:
size_t sum = 0;
for (auto i : modules.modules[4]) {
sum += i;
}
Unlike the C api, it is safe to keep these objects around longer than the main
H5Read
object.