Skip to content

Allowing for subclassing of hdf5::node::Dataset and hdf5::node::Group

Jonas Nilsson edited this page Oct 12, 2017 · 1 revision

The h5cpp library implements a system which can be used to implement converters for most data types that might be of relevance. However, accounting for handling of new data types does not cover all ways in which a user might want to extend the functionality of the h5cpp library. Some examples of additional functionality follows:

  • Modifying the behaviour/functionality of calls to dataset/group member functions.
  • Adding new functionality (member functions) to datsets/groups.
  • Mocking of datsets/groups for use in unit testing.

Datasets and groups are currently created via the create_group() and create_dataset() member functions in the hdf5::node::Group class. This method of object instantiation makes the library easy to use. However, by only allowing for instantiation through these methods extending the functionality of the library is made harder. This text show a few different ways of extending the functionality of the h5cpp library and I explain why I think allowing for instantiation of Dataset and Group instances using constructors of those classes is the best way of providing this feature. The examples presented here make use of hdf5::node::Dataset but the arguments hold true for hdf5::node::Group as well.

Alternatives for extending the functionality of the h5cpp library

The following code is a baseline for writing some data to an HDF5 file.

std::vector<int> someData = getSomeData();
Dataset cDataSet = someGroup.create_dataset("some_name", ...);
cDataSet.write(someData);

Say that we want to perform an extra action when we write some data to an HDF5 file. First we show how this can be done when Dataset instances can be created using the constructor. This is followed by alternatives based on the current solution.

Constructor available

Assuming that it is easy to define your own Dataset, the extra action could be implemented in a member function which overrides the write() member function. The listing at the start of this section would then instead read as follows:

std::vector<int> someData = getSomeData();
MyDataset cDataSet = MyDataset(*someGroup, "some_name", ...);
cDataSet.write(someData);

An example implementation of the MyDataSet class is:

class MyDataset : public Dataset {
public:
  MyDataset(const Group *parent, std::string name, ...) : 
      Dataset(parent, name, ...) {}
  template<typename T>
  void write(const T &data) const {
    Dataset::write(data);
    //Do extra stuff
    //Do extra stuff
  }
};

This solution makes it easy to extend functionality of your application by and to keep those changes self contained, i.e, requiring very little or no re-writing of code that uses the MyDatasetclass.

Constructor not available, alternative 1

The simple solution to this problem is to simply add the relevant extra actions every time you have a call to Dataset::write().

std::vector<int> someData = getSomeData();
Dataset cDataSet = someGroup.create_dataset("some_name", ...);
cDataSet.write(someData);
//Do extra stuff
//Do extra stuff

Using this method is error prone. To make it less error prone, the extra functionality could be added using a function.

template<typename T>
void write(const Dataset &dataset, const T &data) {
  dataset.write(data);
  //Do extra stuff
  //Do extra stuff
}

However, this method is not self contained and it requires the programmer to keep track of the widely different methods for using the dataset.

Constructor not available, alternative 2

It is possible to inherit from the Dataset without having access to the relevant constructor. This alternative transforms the baseline into the following code.

std::vector<int> someData = getSomeData();
MyDataset cDataSet = MyDataset(someGroup.create_dataset("some_name", ...));
cDataSet.write(someData);

Using this method, the programmer will instead be forced to re-implement every member function in Dataset that he or she wants to make use of.

class MyDataset : public Dataset {
public:
  Dataset m_dataset:
  Dataset(Dataset dataset) : Dataset() {m_dataset = dataset;};
  template<typename T>
  void write(...) {
    m_dataset.write(...);
    //Do extra stuff
    //Do extra stuff
  }
  
  template<typename T>
  void read(...) {     
    m_dataset.read(...);
  }
  
  void extent(const Dimensions &dims) {
    m_dataset.extent(dims);
  }
  
  //etc.
};

Although the code in this example is relatively self contained, it does not follow the standard usage patterns set by the library.

Constructor not available, alternative 3

Yet another way of providing the extra functionality is to subclass Group in order to create a MyDataset instance using a factory member function. The baseline code can in that case be re-written as follows.

std::vector<int> someData = getSomeData();
MyDataset cDataSet = someGroup.create_mydataset("some_name", ...);
cDataSet.write(someData);

It should be noted that as hdf5::node::Group does not have constructor which can be used to instantiate it. Instead, hdf5::file::File would have to be subclassed in order for it to be able to create instances of (e.g. MyGroup). Furthermore, hdf5::file::File objects are instantiated using a create()-function which would have to be replaced with a myCreate()-function. The amount of code required for this method to work is not insignificant and the number of classes that requires subclassing ensures that this code is not self contained.

Conclusion

The currently used method of instantiation of objects using factory member functions simplifiesusage of the library for novices. Subclassing of classes makes it easy to easily extend the functionality of a library. The current method of only using factory member functions for creating groups and datasets does not allow for easy subclassing. By modifying the library to allow the instantiation of datasets and groups using constructors, sub-classing would be made easy.