The objective of this project is to use neural networks and symbolic regression to find a closed-form expression of a classifier that separates signal and noise on particle physics problems. We call this closed-form expression an optimal observable since it allows us to separate signal and noise optimally using quantities we can observe experimentally.
There are two ways of designing an optimal observable that separates signal and noise. The first method is when a subject matter expert chooses a few relevant quantities we can observe experimentally. The expert then applies transformations to those observables, resulting in a good separation between signal and noise. This expert-based method happens in a few dimensions since it is difficult for humans to visualize the effects of feature transformations on higher dimensions.
The second method uses machine learning-based techniques. Machine learning models, and specifically neural networks, can process high-dimensional data to achieve a good separation between two classes. However, neural networks lack the closed form and interpretable expressions that result from observables designed by subject matter experts.
Symbolic regression can bridge the gap between closed-form expressions and neural networks to generate observables optimized in high-dimensional spaces that we then approximate with an analytical expression. Bridging this gap allows us to learn optimal observables through the power of machine learning while being able to extract physical meaning from the function learned by the model.
This is a work in progress by Carlos Miguel Patiño and Michael Fenton
- JAX
- PyTorch Lightning
- PySR
- Awkward Array
- uproot
- numpy
- matplotlib
The setup we have for modeling is a neural network with inputs
The optimal observables we use want to learn have a multiplicative form
We use symbolic regression to generate a closed-form expression of the representations
By having an analytic approximation, we have the best of both worlds: we learn an optimal observable through optimization on a high-dimensional space while being able to extract physical knowledge from a closed-form expression. The objective of this project is to use neural networks and symbolic regression to find a closed-form expression of a classifier that separates signal and noise on particle physics problems. We call this closed-form expression an optimal observable since it allows us to separate signal and noise optimally using quantities we can observe experimentally.