Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC L1-regularization parameter tutorial #264

Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions doc/tutorials/fermat_rule_reg.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
.. _fermat_rule_reg:

======================================================
Mathematics Behind L1 Regularization and Fermat's Rule
======================================================

This tutorial presents the mathematics behind solving the optimization problem
:math:`\min f(x) + \lambda \|x\|_1` and demonstrates why the solution is zero when
:math:`\lambda` is greater than the infinity norm of the gradient of :math:`f` at zero, therefore justifying the choice in skglm of

.. code-block::
wassimmazouz marked this conversation as resolved.
Show resolved Hide resolved
alpha_max = np.max(np.abs(gradient0))

Problem setup
=============

Consider the optimization problem:

.. math::
\min_x f(x) + \lambda \|x\|_1

where:

- :math:`f: \mathbb{R}^d \to \mathbb{R}` is a differentiable function,
wassimmazouz marked this conversation as resolved.
Show resolved Hide resolved
- :math:`\|x\|_1` is the L1 norm of :math:`x`,
- :math:`\lambda \in \mathbb{R}` is a regularization parameter.
wassimmazouz marked this conversation as resolved.
Show resolved Hide resolved

We aim to determine the conditions under which the solution to this problem is :math:`x = 0`.

Theoretical Background
======================

According to Fermat's rule, the minimum of the function occurs where the subdifferential of the objective function includes zero. For our problem, the objective function is:

.. math::
g(x) = f(x) + \lambda \|x\|_1

The subdifferential of :math:`\|x\|_1` at 0 is the L-infinity ball:

.. math::
\partial \|x\|_1 |_{x=0} = \{ u \in \mathbb{R}^n : \|u\|_{\infty} \leq 1 \}
wassimmazouz marked this conversation as resolved.
Show resolved Hide resolved

Thus, for :math:`0 \in \partial g(x)` at :math:`x=0`:

.. math::
0 \in \nabla f(0) + \lambda \partial \|x\|_1 |_{x=0}

which implies, given that the dual of L1-norm is L-infinity:
wassimmazouz marked this conversation as resolved.
Show resolved Hide resolved

.. math::
\|\nabla f(0)\|_{\infty} \leq \lambda

If :math:`\lambda > \|\nabla f(0)\|_{\infty}`, then the only solution is :math:`x=0`.

Example
=======

Consider the loss function for Ordinary Least Squares :math:`f(x) = \frac{1}{2} \|Ax - b\|_2^2`. We have:
wassimmazouz marked this conversation as resolved.
Show resolved Hide resolved

.. math::
\nabla f(x) = A^T (Ax - b)

At :math:`x=0`:

.. math::
\nabla f(0) = -A^T b

The infinity norm of the gradient at 0 is:

.. math::
\|\nabla f(0)\|_{\infty} = \|A^T b\|_{\infty}

For :math:`\lambda > \|A^T b\|_{\infty}`, the solution to :math:`\min_x \frac{1}{2} \|Ax - b\|_2^2 + \lambda \|x\|_1` is :math:`x=0`.
wassimmazouz marked this conversation as resolved.
Show resolved Hide resolved



References
==========

The first 5 pages of the following article provide sufficient context for the problem at hand.
wassimmazouz marked this conversation as resolved.
Show resolved Hide resolved

.. _1:
[1] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18, 1 (January 2017), 4671–4703.

5 changes: 5 additions & 0 deletions doc/tutorials/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,8 @@ Get details about Cox datafit equations.
-----------------------------------------------------------------

Mathematical details about the group Lasso, in particular with nonnegativity constraints.

:ref:`Mathematics Behind L1 Regularization and Fermat's Rule <fermat_rule_reg>`
wassimmazouz marked this conversation as resolved.
Show resolved Hide resolved
-----------------------------------------------------------------

Mathematical context about the choice of the regularization parameter in L1-regularization.