See https://github.com/Apollo3zehn/HDF5.NET/issues/9 for not yet implemented features.
A pure C# library without native dependencies that makes reading of HDF5 files (groups, datasets, attributes, links, ...) very easy.
The minimum supported target framework is .NET Standard 2.0 which includes
- .NET Framework 4.6.1+
- .NET Core (all versions)
- .NET 5+
This library runs on all platforms (ARM, x86, x64) and operating systems (Linux, Windows, MacOS, Raspbian, etc) that are supported by the .NET ecosystem without special configuration.
The implemention follows the HDF5 File Format Specification.
// open HDF5 file, the returned H5File instance represents the root group ('/')
using var root = H5File.OpenRead(filePath);
// get nested group
var group = root.Group("/my/nested/group");
// get dataset in group
var dataset = group.Dataset("myDataset");
// alternatively, use the full path
var dataset = group.Dataset("/my/nested/group/myDataset");
// get commited data type in group
var commitedDatatype = group.CommitedDatatype("myCommitedDatatype");
When you do not know what kind of link to expect at a given path, use the following code:
// get H5Object (base class of all HDF5 object types)
var myH5Object = group.Get("/path/to/unknown/object");
With an external link pointing to a relative file path it might be necessary to provide a file prefix (see also this overview).
You can either set an environment variable:
Environment.SetEnvironmentVariable("HDF5_EXT_PREFIX", "/my/prefix/path");
Or you can pass the prefix as an overload parameter:
var linkAccess = new H5LinkAccess()
{
ExternalLinkPrefix = prefix
}
var dataset = group.Dataset(path, linkAccess);
Iterate through all links in a group:
foreach (var link in group.Children)
{
var message = link switch
{
H5Group group => $"I am a group and my name is '{group.Name}'.",
H5Dataset dataset => $"I am a dataset, call me '{dataset.Name}'.",
H5CommitedDatatype datatype => $"I am the data type '{datatype.Name}'.",
H5UnresolvedLink lostLink => $"I cannot find my link target =( shame on '{lostLink.Name}'."
_ => throw new Exception("Unknown link type");
}
Console.WriteLine(message)
}
An H5UnresolvedLink
becomes part of the Children
collection when a symbolic link is dangling, i.e. the link target does not exist or cannot be accessed.
// get attribute of group
var attribute = group.Attribute("myAttributeOnAGroup");
// get attribute of dataset
var attribute = dataset.Attribute("myAttributeOnADataset");
The following code samples work for datasets as well as attributes.
// class: fixed-point
var data = dataset.Read<int>();
// class: floating-point
var data = dataset.Read<double>();
// class: string
var data = dataset.ReadString();
// class: bitfield
[Flags]
enum SystemStatus : ushort /* make sure the enum in HDF file is based on the same type */
{
MainValve_Open = 0x0001
AuxValve_1_Open = 0x0002
AuxValve_2_Open = 0x0004
MainEngine_Ready = 0x0008
FallbackEngine_Ready = 0x0010
// ...
}
var data = dataset.Read<SystemStatus>();
var readyToLaunch = data[0].HasFlag(SystemStatus.MainValve_Open | SystemStatus.MainEngine_Ready);
// class: opaque
var data = dataset.Read<byte>();
var data = dataset.Read<MyOpaqueStruct>();
// class: compound
/* option 1 (faster) */
var data = dataset.Read<MyNonNullableStruct>();
/* option 2 (slower, for more info see the link below after this code block) */
var data = dataset.ReadCompound<MyNullableStruct>();
// class: reference
var data = dataset.Read<H5ObjectReference>();
var firstRef = data.First();
/* NOTE: Dereferencing would be quite fast if the object's name
* was known. Instead, the library searches recursively for the
* object. Do not dereference using a parent (group) that contains
* any circular soft links. Hard links are no problem.
*/
/* option 1 (faster) */
var firstObject = directParent.Get(firstRef);
/* option 1 (slower, use if you don't know the objects parent) */
var firstObject = root.Get(firstRef);
// class: enumerated
enum MyEnum : short /* make sure the enum in HDF file is based on the same type */
{
MyValue1 = 1,
MyValue2 = 2,
// ...
}
var data = dataset.Read<MyEnum>();
// class: variable length
var data = dataset.ReadString();
// class: array
var data = dataset
.Read<int>()
/* dataset dims = int[2, 3] */
/* array dims = int[4, 5] */
.ToArray4D(2, 3, 4, 5);
// class: time
// -> not supported (reason: the HDF5 C lib itself does not fully support H5T_TIME)
For more information on compound data, see section Reading compound data.
Partial I/O is one of the strengths of HDF5 and is applicable to all dataset types (contiguous, compact and chunked). With HDF5.NET, the full dataset can be read with a simple call to dataset.Read()
. However, if you want to read only parts of the dataset, hyperslab selections are your friend. The following code shows how to work with these selections using a three-dimensional dataset (source) and a two-dimensional memory buffer (target):
var dataset = root.Dataset("myDataset");
var memoryDims = new ulong[] { 75, 25 };
var datasetSelection = new HyperslabSelection(
rank: 3,
starts: new ulong[] { 2, 2, 0 },
strides: new ulong[] { 5, 8, 2 },
counts: new ulong[] { 5, 3, 2 },
blocks: new ulong[] { 3, 5, 2 }
);
var memorySelection = new HyperslabSelection(
rank: 2,
starts: new ulong[] { 2, 1 },
strides: new ulong[] { 35, 17 },
counts: new ulong[] { 2, 1 },
blocks: new ulong[] { 30, 15 }
);
var result = dataset
.Read<int>(
fileSelection: datasetSelection,
memorySelection: memorySelection,
memoryDims: memoryDims
)
.ToArray2D(75, 25);
All shown parameters are optional. For example, when the fileSelection
parameter is unspecified, the whole dataset will be read. Note that the number of data points in the file selection must always match that of the memory selection.
Additionally, there is an overload method that allows you to provide your own buffer.
- Shuffle (hardware accelerated1, SSE2/AVX2)
- Fletcher32
- Deflate (zlib)
1 NET Standard 2.1 and above
Before you can use external filters, you need to register them using H5Filter.Register(...)
. This method accepts a filter identifier, a filter name and the actual filter function.
This function could look like the following and should be adapted to your specific filter library:
public static Memory<byte> FilterFunc(
H5FilterFlags flags,
uint[] parameters,
Memory<byte> buffer)
{
// Decompressing
if (flags.HasFlag(H5FilterFlags.Decompress))
{
// pseudo code
byte[] decompressedData = MyFilter.Decompress(parameters, buffer.Span);
return decompressedData;
}
// Compressing
else
{
throw new Exception("Writing data chunks is not yet supported by HDF5.NET.");
}
}
- c-blosc2 (using Blosc2.PInvoke)
- bzip2 (using SharpZipLib)
(1) Install the P/Invoke package:
dotnet package add Blosc2.PInvoke
(2) Add the Blosc filter registration helper function to your code.
(3) Register Blosc:
H5Filter.Register(
identifier: (H5FilterID)32001,
name: "blosc2",
filterFunc: BloscHelper.FilterFunc);
(1) Install the SharpZipLib package:
dotnet package add SharpZipLib
(2) Add the BZip2 filter registration helper function and the MemorySpanStream implementation to your code.
(3) Register BZip2:
H5Filter.Register(
identifier: (H5FilterID)307,
name: "bzip2",
filterFunc: BZip2Helper.FilterFunc);
Sometimes you want to read the data as multidimensional arrays. In that case use one of the byte[]
overloads like ToArray3D
(there are overloads up to 6D). Here is an example:
var data3D = dataset
.Read<int>()
.ToArray3D(new long[] { -1, 7, 2 });
The methods accepts a long[]
with the new array dimensions. This feature works similar to Matlab's reshape function. A slightly adapted citation explains the behavior:
When you use
-1
to automatically calculate a dimension size, the dimensions that you do explicitly specify must divide evenly into the number of elements in the input array.
Structs without any nullable fields (i.e. no strings and other reference types) can be read like any other dataset using a high performance copy operation:
[StructLayout(LayoutKind.Explicit, Size = 5)]
internal struct SimpleStruct
{
[FieldOffset(0)]
public byte ByteValue;
[FieldOffset(1)]
public ushort UShortValue;
[FieldOffset(3)]
public TestEnum EnumValue;
}
var compoundData = dataset.Read<SimpleStruct>();
Just make sure the field offset attributes matches the field offsets defined in the HDF5 file when the dataset was created.
This method does not require that the structs field names match since they are simply mapped by their offset.
If you have a struct with string fields, you need to use the slower ReadCompound
method:
internal struct NullableStruct
{
public float FloatValue;
public string StringValue1;
public string StringValue2;
public byte ByteValue;
public short ShortValue;
}
var compoundData = dataset.ReadCompound<NullableStruct>();
var compoundData = attribute.ReadCompound<NullableStruct>();
This method requires no special attributes but it is mandatory that the field names match exactly to those in the HDF5 file. If you would like to use custom field names, consider the following solution:
// Apply the H5NameAttribute to the field with custom name.
internal struct NullableStructWithCustomFieldName
{
[H5Name("FloatValue")]
public float FloatValueWithCustomName;
// ... more fields
}
// Create a name translator.
Func<FieldInfo, string> converter = fieldInfo =>
{
var attribute = fieldInfo.GetCustomAttribute<H5NameAttribute>(true);
return attribute is not null ? attribute.Name : fieldInfo.Name;
};
// Use that name translator.
var compoundData = dataset.ReadCompound<NullableStructWithCustomFieldName>(converter);