Home

Root Storage of Deep Learning Models in TMVA

Following is the Google Summer of Code 2021 Final Report on Project Root Storage of Deep Learning Models in TMVA conducted under CERN-HSF.

Project Details


Student's Name	Sanjiban Sengupta
Mentors	Lorenzo Moneta, Sitong An, Anirudh Dagar
Organization	Root-Project (CERN-HSF)
Organization Code Repository	https://github.com/root-project/root
Project Page	https://summerofcode.withgoogle.com/projects/#5424575602491392
Code Implementations	https://github.com/root-project/root/pulls?q=author:sanjibansg
Documentation Blog	https://blog.sanjiban.ml/series/gsoc

About Project

The Toolkit for Multivariate Data Analysis (TMVA) is a sub-module of ROOT which provides a machine learning environment for conducting the training, testing, and evaluation of various multivariate methods especially used in High-energy Physics. Recently, the TMVA team introduced SOFIE ( System for Fast Inference code Emit) which facilitates its own intermediate representation of deep learning models following the ONNX standards. To facilitate the usage, storage, and exchange of these models, this project aimed at developing the storage functionality of Deep Learning models in the .root format, popular in the High Energy Physics community.

Deliverables

Functionality for serialization of RModel for storing a trained deep learning model in .root format.
Functionality for parsing a Keras .h5 file into a RModel object for generation of inference code.
Functionality for parsing a PyTorch .pt file into a RModel object for generation of inference code.
Tests and Tutorials for various parsers of TMVA SOFIE's RModel object.

Implementations

1. Serialization of RModel PR#8666

Link to Blog article: https://blog.sanjiban.ml/root-project-introducing-sofie
Description RModel is the primary class defined in SOFIE for storing the configuration and weights of a trained deep learning model, and ROperator is the abstract base class from which various operators are derived. Following ONNX standards, the ROperators are responsible for generating specific inference codes to operate on input tensors and provide the outputs as per the attributes provided. It was required to make the RModel class serializable so that it can be saved in .root format.
Progress
- Modifying the Data Structures
  - Modifying struct InitializedTensor
  - Modifying class RModel & ROperator
- Modifying the LinkDef file
- Adding the Custom Streamer to RModel
- Tests
  - Emit Files for generating header files
  - Tests for Parser

Interface

//Writing ROOT File
TFile file("model.root","CREATE");
using namespace TMVA::Experimental;
SOFIE::RModel model = SOFIE::PyKeras::Parse("trained_model_dense.h5");
model.Write("model");
file.Close();

//Reading ROOT File
TFile file("model.root","READ");
using namespace TMVA::Experimental;
SOFIE::RModel *model;
file.GetObject("model",model);
file.Close();

2. Keras Parser for RModel PR#8430

Link to blog article https://blog.sanjiban.ml/root-project-keras-parser-for-sofie
Description Converter for Keras .h5 models was required for translating a Keras Sequential API and Functional API model into a RModel object for the subsequent generation of inference code.
Progress
- Restructured SOFIE to avoid dependency conflicts between different Python libraries
- Parser function for extracting the model information and weights and instantiate a RModel Object
  - Support for Keras Sequential API Models
  - Support for Keras Functional API Models
  - Supports Dense (with relu activation), ReLU and Permute Layers
  - Header file for the function
  - Function implementation
- Converter Function writing the RModel containing the model information into ROOT file.
  - Header file for the function
  - Function implementation
- Tests
  - Emit Files for generating header files
  - Tests for Parser
- Tutorials

Interface

//Parser returns a RModel object
using TMVA::Experimental::SOFIE;
RModel model = PyKeras::Parse("trained_model_dense.h5");

//Converter writes a ROOT file directly
PyKeras::ConvertToRoot(“trained_model_dense.h5”);

3. PyTorch Parser for RModel PR#8684

Link to Blog Article https://blog.sanjiban.ml/root-project-pytorch-parser-for-sofie
Description Converters for PyTorch .pt models saved using TorchScript, were required to be parsed into a RModel object for the subsequent generation of inference code. The developed functionality required the shape of the input tensor and its data type. If not specified, the data type is by default Float, but the shapes vector is a mandatory parameter.
Progress
- Parser function for extracting the model information and weights and instantiate a RModel Object
  - Support for PyTorch nn.Module, nn.Sequential, nn.ModuleList containers.
  - Supports Linear, ReLU and Transpose Layers/operations.
  - Supports tensors of dynamic axes.
  - Header file for the function
  - Function implementation
- Converter Function writing the RModel containing the model information into ROOT file.
  - Header file for the function
  - Function implementation
- Tests
  - Emit Files for generating header files
  - Tests for Parser
- Tutorials

Interface

//Parser returns a RModel object
using TMVA::Experimental::SOFIE;

//Building the vector for input shapes
std::vector<size_t> s1{120,1};
std::vector<std::vector<size_t>> inputShape{s1};
RModel model = PyTorch::Parse("trained_model_dense.pt",inputShape);

//Converter write3s a ROOT file directly
std::vector<size_t> s1{120,1};
std::vector<std::vector<size_t>> shape{s1};
PyTorch::ConvertToRoot(“trained_model_dense.pt”,inputShape);

4. Tests & Tutorials

Tests were built on Google's GTest Framework. Python Scripts were developed which were run by the C-Python API to generate models and save them. Then these models were parsed and the correctness of the Parsers was validated by comparing the outputs from the generated inference code and the saved models when called on the same input tensors.
Simple Tutorials were built (PR#8874) for showcasing use cases of the Parsers, generation of inference code, and usage of functions defined in RModel class.

5. Extras

After implementing the Expected deliverables, I started working on the development of the Root-Storage of BDT. The implementation required developing a class that will be the primary data structure for holding model configuration & weights and will be serializable into the .root file. Also, a Parse function was required for translating a BDT model which was trained in TMVA and saved in the .xml file. And lastly, a Mapping interface to TMVA Tree Inference for generating inference code. The developed class was initially implemented by Jonas Rembser (https://github.com/guitargeek/tmva-to-xgboost/), and modifications were done by me.

Interface

//Parser loads the BDT model from .xml to RootStorage::BDT object
TMVA::Experimental::RootStorage::BDT model;
bool usePurity = true;
model.Parse("TMVA_CNN_Classification_BDT.weights.xml",usePurity);

Contributions

Pull Request	PR Number	Status
Restructured SOFIE	#8594
Serialisation of RModel	#8666
Modifying AddOutputTensorNameList()	#8640
PyKeras Converter TMVA	#8430
PyTorch Converter TMVA	#8684
Tutorials for RModel Parsers	#8874
Root Storage of BDT	#8873

Future Plan

Documenting the data structures and functions in SOFIE and the Parsers using DOxygen.
Contribute to ROOT & TMVA for implementing, improving, and debugging code.
Development of Root Storage of BDT
- Develop the mapping interface for inference code generation from class RootStorage::BDT
- Researching on the conversion of scikit-learn based BDT models to class RootStorage::BDT for subsequent inference
- Adding tests & tutorials
Development of ROperators
- Implementing classes for various ROperators for ONNX & ONNX-ml

Conclusion

Planned goals of the project were successfully implemented. Currently, in the experimental stage, SOFIE requires continuous development and holds effective applications on the inference of deep learning models. I wish to contribute to the project in the future in implementing functionalities, improving features, and debugging issues. I had an in-depth understanding of the Root-Project and its applications in High-Energy Physics. While working on the project, I faced numerous challenges but learned the way to tackle them. In the due course, I did learn about a lot of tools, methods, and concepts for developing robust applications. It was a dream to work with the people from the largest Particle Physics Facility in the world, and I am blessed to receive the opportunity and guidance from them, and sincerely wish to receive the chance to work with them again.

Acknowledgement

First of all, I convey my thanks to Google for organizing this event of massive learning, networking, and experiencing the development of open-source software. I am highly grateful to my mentors Lorenzo Moneta, Sitong An, Anirudh Dagar, and CERN-HSF to provide me the opportunity to work on the project, and for all the guidance and help they have been providing. I am also thankful to TMVA Team member Omar Andres Zapata Mesa for his help and support in implementing and debugging the functionalities. Lastly, I thank all the student developers to make this program successful, my friends and seniors for their continuous help and support, and my Parents for their belief, guidance, and support in all my endeavors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Root Storage of Deep Learning Models in TMVA

Project Details

Table of Contents

About Project

Deliverables

Implementations

1. Serialization of RModel PR#8666

Progress

2. Keras Parser for RModel PR#8430

3. PyTorch Parser for RModel PR#8684

4. Tests & Tutorials

5. Extras

Contributions

Future Plan

Conclusion

Acknowledgement

Clone this wiki locally