Merge pull request #164 from chhoumann/kb-254-ngboost

[KB-254] Describe NGBoost
chhoumann · May 28, 2024 · 532af5a · 532af5a
2 parents ba494c3 + 4e269d2
commit 532af5a
Show file tree

Hide file tree

Showing 2 changed files with 53 additions and 1 deletion.
diff --git a/report_thesis/src/references.bib b/report_thesis/src/references.bib
@@ -563,4 +563,21 @@ @book{learningwithkernels
     isbn = {9780262256933},
     doi = {10.7551/mitpress/4175.001.0001},
     url = {https://doi.org/10.7551/mitpress/4175.001.0001},
-}
+}
+
+
+@misc{duan_ngboost_2020,
+	title = {{NGBoost}: {Natural} {Gradient} {Boosting} for {Probabilistic} {Prediction}},
+	shorttitle = {{NGBoost}},
+	url = {http://arxiv.org/abs/1910.03225},
+	abstract = {We present Natural Gradient Boosting (NGBoost), an algorithm for generic probabilistic prediction via gradient boosting. Typical regression models return a point estimate, conditional on covariates, but probabilistic regression models output a full probability distribution over the outcome space, conditional on the covariates. This allows for predictive uncertainty estimation — crucial in applications like healthcare and weather forecasting. NGBoost generalizes gradient boosting to probabilistic regression by treating the parameters of the conditional distribution as targets for a multiparameter boosting algorithm. Furthermore, we show how the Natural Gradient is required to correct the training dynamics of our multiparameter boosting approach. NGBoost can be used with any base learner, any family of distributions with continuous parameters, and any scoring rule. NGBoost matches or exceeds the performance of existing methods for probabilistic prediction while offering additional beneﬁts in ﬂexibility, scalability, and usability. An open-source implementation is available at github.com/stanfordmlgroup/ngboost.},
+	language = {en},
+	urldate = {2024-05-28},
+	publisher = {arXiv},
+	author = {Duan, Tony and Avati, Anand and Ding, Daisy Yi and Thai, Khanh K. and Basu, Sanjay and Ng, Andrew Y. and Schuler, Alejandro},
+	month = jun,
+	year = {2020},
+	note = {arXiv:1910.03225 [cs, stat]},
+	keywords = {Computer Science - Machine Learning, Statistics - Machine Learning},
+	annote = {Comment: Accepted for ICML 2020},
+}
diff --git a/report_thesis/src/sections/background.tex b/report_thesis/src/sections/background.tex
@@ -357,6 +357,41 @@ \subsubsection{Gradient Boosting Regression (GBR)}\label{sec:gradientboost}
 In the context of regression, gradient boosting aims to minimize the difference between the predicted values and the actual target values by fitting successive trees to the residuals.
 To minimize errors, gradient descent is used to iteratively update model parameters in the direction of the negative gradient of the loss function, thereby following the path of steepest descent~\cite{gradientLossFunction}.
 
+\subsubsection{Natural Gradient Boosting (NGBoost)}
+Having introduced \gls{gbr}, we now give an overview of \gls{ngboost} based on \citet{duan_ngboost_2020}.
+
+\gls{ngboost} is a variant of the gradient boosting algorithm that leverages the concept of natural gradients with the goal of improving convergence speed and model performance.
+In more complex models, the parameter space can be curved and thus non-Euclidean, making the standard gradient descent less effective.
+Consequently, using the standard gradient descent can lead to slow convergence and suboptimal performance.
+In such scenarios, the application of natural gradients becomes particularly advantageous.
+
+Natural gradients account for the underlying geometry of the parameter space by using information about its curvature.
+By incorporating this information, natural gradients can navigate the parameter space more efficiently, leading to faster convergence and better performance.
+In addition, \gls{ngboost} provides its predictions in the form of probability distributions, allowing it to estimate the uncertainty associated with its predictions.
+
+The algorithm starts by initializing a model with a guess for the parameters of the probability distribution, usually starting with something simple like a Gaussian distribution.
+This initial model prediction represents the probability distribution over the target variable based on the given features.
+
+Then, the algorithm enters an iterative process to refine its predictions.
+At the start of each iteration, the model computes its current predictions using the existing set of parameters.
+The algorithm then calculates the negative gradient of the loss function with respect to the current predictions.
+This involves computing the gradient of the negative log-likelihood, which quantifies the discrepancy between the current predictions and the actual observed data.
+The negative log-likelihood quantifies how well the model's predicted probability distribution matches the observed data, with lower values indicating better alignment between predictions and observations. 
+
+Next, the \textit{Fisher information matrix} is computed. 
+This matrix encodes the curvature of the parameter space at the current parameter values, reflecting how sensitive the likelihood function is to changes in these parameters.
+For example, if the likelihood function is highly sensitive to changes in a particular parameter, the Fisher information matrix will have a high value for that parameter.
+Using this information, the model can adjust its parameters more effectively, focusing on the most sensitive parameters to improve performance.
+
+The standard gradient, or residuals, which is derived from the negative log-likelihood, is then transformed using the inverse of the Fisher information matrix to obtain what is known as the natural gradient.
+Next, a weak learner, typically a decision tree, is fitted to these natural gradients.
+This step is similar to traditional gradient boosting, where a tree is fitted to the residuals, but in \gls{ngboost}, the tree is fitted to the natural gradients instead.
+
+The parameters of the model are then updated using the output from the weak learner.
+This update process incorporates a learning rate to control the step size, ensuring that the model makes gradual improvements rather than drastic changes.
+
+Using the newly updated parameters, the model recalculates its predictions, refining the probability distribution of the target variable. 
+This iterative process of computing predictions, calculating gradients, fitting weak learners, and updating parameters continues for a predetermined number of iterations or until the model's performance converges.
 
 \subsubsection{XGBoost}