From 95fd720e1ed20d4267931169a1f019346e6d9f57 Mon Sep 17 00:00:00 2001
From: wassimmazouz <wassimmazouz5261@gmail.com>
Date: Tue, 2 Jul 2024 10:40:21 +0200
Subject: [PATCH 1/7] init .rst file for L1-reg tutorial

---
 doc/tutorials/fermat_rule_reg.rst | 50 +++++++++++++++++++++++++++++++
 doc/tutorials/tutorials.rst       |  5 ++++
 2 files changed, 55 insertions(+)
 create mode 100644 doc/tutorials/fermat_rule_reg.rst

diff --git a/doc/tutorials/fermat_rule_reg.rst b/doc/tutorials/fermat_rule_reg.rst
new file mode 100644
index 000000000..4697f7ee5
--- /dev/null
+++ b/doc/tutorials/fermat_rule_reg.rst
@@ -0,0 +1,50 @@
+.. _fermat_rule_reg:
+
+======================================================
+Mathematics Behind L1 Regularization and Fermat's Rule
+======================================================
+
+This tutorial presents the mathematics behind solving the optimization problem
+:math:`\min f(x) + \lambda \|x\|_1` and demonstrates why the solution is zero when
+:math:`\lambda` is greater than the infinity norm of the gradient of :math:`f` at zero, therefore justifying the choice in skglm of
+
+.. code-block::
+alpha_max = (popu_X.T @ (1 - popu_Y) / len(popu_Y)).max()
+
+Problem setup
+=============
+
+Consider the optimization problem:
+
+.. math::
+    \min_x f(x) + \lambda \|x\|_1
+
+where:
+
+- :math:`f: \mathbb{R}^d \to \mathbb{R}` is a differentiable function,
+- :math:`\|x\|_1` is the L1 norm of :math:`x`,
+- :math:`\lambda \in \mathbb{R}` is a regularization parameter.
+
+We aim to determine the conditions under which the solution to this problem is :math:`x = 0`.
+
+Theoretical Background
+======================
+
+According to Fermat's rule, the minimum of the function occurs where the subdifferential of the objective function includes zero. For our problem, the objective function is:
+
+.. math::
+    g(x) = f(x) + \lambda \|x\|_1
+
+The subdifferential of :math:`\|x\|_1` at 0 is the L-infinity ball:
+
+.. math::
+    \partial \|x\|_1 |_{x=0} = \{ u \in \mathbb{R}^n : \|u\|_{\infty} \leq 1 \}
+
+
+
+References
+==========
+
+.. _1:
+[1] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18, 1 (January 2017), 4671–4703.
+
diff --git a/doc/tutorials/tutorials.rst b/doc/tutorials/tutorials.rst
index 24652a871..fdc442ff4 100644
--- a/doc/tutorials/tutorials.rst
+++ b/doc/tutorials/tutorials.rst
@@ -33,3 +33,8 @@ Get details about Cox datafit equations.
 -----------------------------------------------------------------
 
 Mathematical details about the group Lasso, in particular with nonnegativity constraints.
+
+:ref:`Mathematics Behind L1 Regularization and Fermat's Rule <fermat_rule_reg>`
+-----------------------------------------------------------------
+
+Mathematical context about the choice of the regularization parameter in L1-regularization.

From 9d080af8d1c7bcc8292ee5b4d61cc819684cb9f1 Mon Sep 17 00:00:00 2001
From: wassimmazouz <wassimmazouz5261@gmail.com>
Date: Tue, 2 Jul 2024 11:45:42 +0200
Subject: [PATCH 2/7] Theoretical background and example

---
 doc/tutorials/fermat_rule_reg.rst | 36 ++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/doc/tutorials/fermat_rule_reg.rst b/doc/tutorials/fermat_rule_reg.rst
index 4697f7ee5..685916154 100644
--- a/doc/tutorials/fermat_rule_reg.rst
+++ b/doc/tutorials/fermat_rule_reg.rst
@@ -9,7 +9,7 @@ This tutorial presents the mathematics behind solving the optimization problem
 :math:`\lambda` is greater than the infinity norm of the gradient of :math:`f` at zero, therefore justifying the choice in skglm of
 
 .. code-block::
-alpha_max = (popu_X.T @ (1 - popu_Y) / len(popu_Y)).max()
+alpha_max = np.max(np.abs(gradient0))
 
 Problem setup
 =============
@@ -40,11 +40,45 @@ The subdifferential of :math:`\|x\|_1` at 0 is the L-infinity ball:
 .. math::
     \partial \|x\|_1 |_{x=0} = \{ u \in \mathbb{R}^n : \|u\|_{\infty} \leq 1 \}
 
+Thus, for :math:`0 \in \partial g(x)` at :math:`x=0`:
+
+.. math::
+    0 \in \nabla f(0) + \lambda \partial \|x\|_1 |_{x=0}
+
+which implies, given that the dual of L1-norm is L-infinity:
+
+.. math::
+    \|\nabla f(0)\|_{\infty} \leq \lambda
+
+If :math:`\lambda > \|\nabla f(0)\|_{\infty}`, then the only solution is :math:`x=0`.
+
+Example
+=======
+
+Consider the loss function for Ordinary Least Squares :math:`f(x) = \frac{1}{2} \|Ax - b\|_2^2`. We have:
+
+.. math::
+    \nabla f(x) = A^T (Ax - b)
+
+At :math:`x=0`:
+
+.. math::
+    \nabla f(0) = -A^T b
+
+The infinity norm of the gradient at 0 is:
+
+.. math::
+    \|\nabla f(0)\|_{\infty} = \|A^T b\|_{\infty}
+
+For :math:`\lambda > \|A^T b\|_{\infty}`, the solution to :math:`\min_x \frac{1}{2} \|Ax - b\|_2^2 + \lambda \|x\|_1` is :math:`x=0`.
+
 
 
 References
 ==========
 
+The first 5 pages of the following article provide sufficient context for the problem at hand.
+
 .. _1:
 [1] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18, 1 (January 2017), 4671–4703.
 

From 70b99ec66ea639d60db5d3c36d3f35cf6989e076 Mon Sep 17 00:00:00 2001
From: wassimmazouz <wassimmazouz5261@gmail.com>
Date: Tue, 2 Jul 2024 15:03:07 +0200
Subject: [PATCH 3/7] requested changes

---
 doc/tutorials/fermat_rule_reg.rst | 30 ++++++++++++++++--------------
 doc/tutorials/tutorials.rst       |  2 +-
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/doc/tutorials/fermat_rule_reg.rst b/doc/tutorials/fermat_rule_reg.rst
index 685916154..d68180f9e 100644
--- a/doc/tutorials/fermat_rule_reg.rst
+++ b/doc/tutorials/fermat_rule_reg.rst
@@ -1,8 +1,8 @@
-.. _fermat_rule_reg:
+.. _reg_sol_zero:
 
-======================================================
-Mathematics Behind L1 Regularization and Fermat's Rule
-======================================================
+==========================================================
+Critical regularization strength above which solution is 0
+==========================================================
 
 This tutorial presents the mathematics behind solving the optimization problem
 :math:`\min f(x) + \lambda \|x\|_1` and demonstrates why the solution is zero when
@@ -11,6 +11,8 @@ This tutorial presents the mathematics behind solving the optimization problem
 .. code-block::
 alpha_max = np.max(np.abs(gradient0))
 
+However, the regularization parameter used at the end should preferably be a fraction of this (e.g. `alpha = 0.01 * alpha_max`).
+
 Problem setup
 =============
 
@@ -21,9 +23,9 @@ Consider the optimization problem:
 
 where:
 
-- :math:`f: \mathbb{R}^d \to \mathbb{R}` is a differentiable function,
+- :math:`f: \mathbb{R}^d \to \mathbb{R}` is a convex differentiable function,
 - :math:`\|x\|_1` is the L1 norm of :math:`x`,
-- :math:`\lambda \in \mathbb{R}` is a regularization parameter.
+- :math:`\lambda > 0` is a regularization parameter.
 
 We aim to determine the conditions under which the solution to this problem is :math:`x = 0`.
 
@@ -38,14 +40,14 @@ According to Fermat's rule, the minimum of the function occurs where the subdiff
 The subdifferential of :math:`\|x\|_1` at 0 is the L-infinity ball:
 
 .. math::
-    \partial \|x\|_1 |_{x=0} = \{ u \in \mathbb{R}^n : \|u\|_{\infty} \leq 1 \}
+    \partial \|x\|_1 |_{x=0} = \{ u \in \mathbb{R}^d : \|u\|_{\infty} \leq 1 \}
 
 Thus, for :math:`0 \in \partial g(x)` at :math:`x=0`:
 
 .. math::
     0 \in \nabla f(0) + \lambda \partial \|x\|_1 |_{x=0}
 
-which implies, given that the dual of L1-norm is L-infinity:
+which implies, given that the dual norm of L1-norm is L-infinity:
 
 .. math::
     \|\nabla f(0)\|_{\infty} \leq \lambda
@@ -55,29 +57,29 @@ If :math:`\lambda > \|\nabla f(0)\|_{\infty}`, then the only solution is :math:`
 Example
 =======
 
-Consider the loss function for Ordinary Least Squares :math:`f(x) = \frac{1}{2} \|Ax - b\|_2^2`. We have:
+Consider the loss function for Ordinary Least Squares :math:`f(x) = \frac{1}{2n} \|Ax - b\|_2^2`, where :math:`n` is the number of samples. We have:
 
 .. math::
-    \nabla f(x) = A^T (Ax - b)
+    \nabla f(x) = \frac{1}{n}A^T (Ax - b)
 
 At :math:`x=0`:
 
 .. math::
-    \nabla f(0) = -A^T b
+    \nabla f(0) = -\frac{1}{n}A^T b
 
 The infinity norm of the gradient at 0 is:
 
 .. math::
-    \|\nabla f(0)\|_{\infty} = \|A^T b\|_{\infty}
+    \|\nabla f(0)\|_{\infty} = \frac{1}{n}\|A^T b\|_{\infty}
 
-For :math:`\lambda > \|A^T b\|_{\infty}`, the solution to :math:`\min_x \frac{1}{2} \|Ax - b\|_2^2 + \lambda \|x\|_1` is :math:`x=0`.
+For :math:`\lambda \geq \frac{1}{n}\|A^T b\|_{\infty}`, the solution to :math:`\min_x \frac{1}{2n} \|Ax - b\|_2^2 + \lambda \|x\|_1` is :math:`x=0`.
 
 
 
 References
 ==========
 
-The first 5 pages of the following article provide sufficient context for the problem at hand.
+Refer to the section 3.1 and proposition 4 in particular of the following article for more details.
 
 .. _1:
 [1] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18, 1 (January 2017), 4671–4703.
diff --git a/doc/tutorials/tutorials.rst b/doc/tutorials/tutorials.rst
index fdc442ff4..00685c31a 100644
--- a/doc/tutorials/tutorials.rst
+++ b/doc/tutorials/tutorials.rst
@@ -34,7 +34,7 @@ Get details about Cox datafit equations.
 
 Mathematical details about the group Lasso, in particular with nonnegativity constraints.
 
-:ref:`Mathematics Behind L1 Regularization and Fermat's Rule <fermat_rule_reg>`
+:ref:`Critical regularization strength above which solution is 0 <reg_sol_zero>`
 -----------------------------------------------------------------
 
 Mathematical context about the choice of the regularization parameter in L1-regularization.

From b79162f3c2afae5b412fecdf221157136d1f5776 Mon Sep 17 00:00:00 2001
From: mathurinm <mathurin.massias@gmail.com>
Date: Tue, 2 Jul 2024 15:51:07 +0200
Subject: [PATCH 4/7] changes

---
 doc/tutorials/alpha_max.rst       | 93 +++++++++++++++++++++++++++++++
 doc/tutorials/fermat_rule_reg.rst | 86 ----------------------------
 doc/tutorials/tutorials.rst       | 10 ++--
 3 files changed, 98 insertions(+), 91 deletions(-)
 create mode 100644 doc/tutorials/alpha_max.rst
 delete mode 100644 doc/tutorials/fermat_rule_reg.rst

diff --git a/doc/tutorials/alpha_max.rst b/doc/tutorials/alpha_max.rst
new file mode 100644
index 000000000..e1e23a256
--- /dev/null
+++ b/doc/tutorials/alpha_max.rst
@@ -0,0 +1,93 @@
+.. _alpha_max:
+
+==========================================================
+Critical regularization strength above which solution is 0
+==========================================================
+
+This tutorial shows that for :math:`\lambda \geq \lambda_{\text{max}} = || \nabla f(0) ||_{\infty}`, the solution to
+:math:`\min f(x) + \lambda || x ||_1` is 0.
+
+In skglm, we thus frequently use
+
+.. code-block::
+
+    alpha_max = np.max(np.abs(gradient0))
+
+and choose for the regularization strength :\math:`\alpha` a fraction of this critical value, e.g. ``alpha = 0.01 * alpha_max``.
+
+Problem setup
+=============
+
+Consider the optimization problem:
+
+.. math::
+    \min_x f(x) + \lambda || x||_1
+
+where:
+
+- :math:`f: \mathbb{R}^d \to \mathbb{R}` is a convex differentiable function,
+- :math:`|| x ||_1` is the L1 norm of :math:`x`,
+- :math:`\lambda > 0` is the regularization parameter.
+
+We aim to determine the conditions under which the solution to this problem is :math:`x = 0`.
+
+Theoretical background
+======================
+
+
+Let
+
+.. math::
+
+    g(x) = f(x) + \lambda || x||_1
+
+According to Fermat's rule, 0 is the minimizer of :math:`g` if and only if 0 is in the subdifferential of :math:`g` at 0.
+The subdifferential of :math:`|| x ||_1` at 0 is the L-infinity unit ball:
+
+.. math::
+    \partial || \cdot ||_1 (0) = \{ u \in \mathbb{R}^d : ||u||_{\infty} \leq 1 \}
+
+Thus,
+
+.. math::
+
+    0 \in \text{argmin} ~ g(x)
+    &\Leftrightarrow 0 \in \partial g(0) \\
+    &\Leftrightarrow
+    0 \in \nabla f(0) + \lambda \partial || \cdot ||_1 (0) \\
+    &\Leftrightarrow - \nabla f(0)  \in  \lambda \{ u \in \mathbb{R}^d : ||u||_{\infty} \leq 1 \} \\
+    &\Leftrightarrow || \nabla f(0) ||_\infty \leq \lambda
+
+
+We have just shown that the minimizer of :math:`g = f + \lambda || \cdot ||_1` is 0 if and only if :math:`\lambda \geq ||\nabla f(0)||_{\infty}`.
+
+Example
+=======
+
+Consider the loss function for Ordinary Least Squares :math:`f(x) = \frac{1}{2n} ||Ax - b||_2^2`, where :math:`n` is the number of samples. We have:
+
+.. math::
+    \nabla f(x) = \frac{1}{n}A^T (Ax - b)
+
+At :math:`x=0`:
+
+.. math::
+    \nabla f(0) = -\frac{1}{n}A^T b
+
+The infinity norm of the gradient at 0 is:
+
+.. math::
+    ||\nabla f(0)||_{\infty} = \frac{1}{n}||A^T b||_{\infty}
+
+For :math:`\lambda \geq \frac{1}{n}||A^T b||_{\infty}`, the solution to :math:`\min_x \frac{1}{2n} ||Ax - b||_2^2 + \lambda || x||_1` is :math:`x=0`.
+
+
+
+References
+==========
+
+Refer to Section 3.1 and Proposition 4 in particular of [1] for more details.
+
+.. _1:
+[1] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18, 1 (January 2017), 4671–4703.
+
diff --git a/doc/tutorials/fermat_rule_reg.rst b/doc/tutorials/fermat_rule_reg.rst
deleted file mode 100644
index d68180f9e..000000000
--- a/doc/tutorials/fermat_rule_reg.rst
+++ /dev/null
@@ -1,86 +0,0 @@
-.. _reg_sol_zero:
-
-==========================================================
-Critical regularization strength above which solution is 0
-==========================================================
-
-This tutorial presents the mathematics behind solving the optimization problem
-:math:`\min f(x) + \lambda \|x\|_1` and demonstrates why the solution is zero when
-:math:`\lambda` is greater than the infinity norm of the gradient of :math:`f` at zero, therefore justifying the choice in skglm of
-
-.. code-block::
-alpha_max = np.max(np.abs(gradient0))
-
-However, the regularization parameter used at the end should preferably be a fraction of this (e.g. `alpha = 0.01 * alpha_max`).
-
-Problem setup
-=============
-
-Consider the optimization problem:
-
-.. math::
-    \min_x f(x) + \lambda \|x\|_1
-
-where:
-
-- :math:`f: \mathbb{R}^d \to \mathbb{R}` is a convex differentiable function,
-- :math:`\|x\|_1` is the L1 norm of :math:`x`,
-- :math:`\lambda > 0` is a regularization parameter.
-
-We aim to determine the conditions under which the solution to this problem is :math:`x = 0`.
-
-Theoretical Background
-======================
-
-According to Fermat's rule, the minimum of the function occurs where the subdifferential of the objective function includes zero. For our problem, the objective function is:
-
-.. math::
-    g(x) = f(x) + \lambda \|x\|_1
-
-The subdifferential of :math:`\|x\|_1` at 0 is the L-infinity ball:
-
-.. math::
-    \partial \|x\|_1 |_{x=0} = \{ u \in \mathbb{R}^d : \|u\|_{\infty} \leq 1 \}
-
-Thus, for :math:`0 \in \partial g(x)` at :math:`x=0`:
-
-.. math::
-    0 \in \nabla f(0) + \lambda \partial \|x\|_1 |_{x=0}
-
-which implies, given that the dual norm of L1-norm is L-infinity:
-
-.. math::
-    \|\nabla f(0)\|_{\infty} \leq \lambda
-
-If :math:`\lambda > \|\nabla f(0)\|_{\infty}`, then the only solution is :math:`x=0`.
-
-Example
-=======
-
-Consider the loss function for Ordinary Least Squares :math:`f(x) = \frac{1}{2n} \|Ax - b\|_2^2`, where :math:`n` is the number of samples. We have:
-
-.. math::
-    \nabla f(x) = \frac{1}{n}A^T (Ax - b)
-
-At :math:`x=0`:
-
-.. math::
-    \nabla f(0) = -\frac{1}{n}A^T b
-
-The infinity norm of the gradient at 0 is:
-
-.. math::
-    \|\nabla f(0)\|_{\infty} = \frac{1}{n}\|A^T b\|_{\infty}
-
-For :math:`\lambda \geq \frac{1}{n}\|A^T b\|_{\infty}`, the solution to :math:`\min_x \frac{1}{2n} \|Ax - b\|_2^2 + \lambda \|x\|_1` is :math:`x=0`.
-
-
-
-References
-==========
-
-Refer to the section 3.1 and proposition 4 in particular of the following article for more details.
-
-.. _1:
-[1] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18, 1 (January 2017), 4671–4703.
-
diff --git a/doc/tutorials/tutorials.rst b/doc/tutorials/tutorials.rst
index 00685c31a..d42d8eeaa 100644
--- a/doc/tutorials/tutorials.rst
+++ b/doc/tutorials/tutorials.rst
@@ -25,16 +25,16 @@ Explore how ``skglm`` fits an unpenalized intercept.
 
 
 :ref:`Mathematics behind Cox datafit <maths_cox_datafit>`
------------------------------------------------------------------
+---------------------------------------------------------
 
 Get details about Cox datafit equations.
 
 :ref:`Details on the group Lasso <prox_nn_group_lasso>`
------------------------------------------------------------------
+-------------------------------------------------------
 
 Mathematical details about the group Lasso, in particular with nonnegativity constraints.
 
-:ref:`Critical regularization strength above which solution is 0 <reg_sol_zero>`
------------------------------------------------------------------
+:ref:`Critical regularization strength above which solution is 0 <alpha_max>`
+-----------------------------------------------------------------------------
 
-Mathematical context about the choice of the regularization parameter in L1-regularization.
+How to chose the regularization strength in L1-regularization?

From c68294e1824b3603ae6f3701d77cb21cb42c8ab2 Mon Sep 17 00:00:00 2001
From: wassimmazouz <wassimmazouz5261@gmail.com>
Date: Tue, 2 Jul 2024 16:09:29 +0200
Subject: [PATCH 5/7] FIX unindent error message

---
 doc/tutorials/alpha_max.rst | 1 +
 doc/tutorials/tutorials.rst | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/tutorials/alpha_max.rst b/doc/tutorials/alpha_max.rst
index e1e23a256..f229d41a3 100644
--- a/doc/tutorials/alpha_max.rst
+++ b/doc/tutorials/alpha_max.rst
@@ -89,5 +89,6 @@ References
 Refer to Section 3.1 and Proposition 4 in particular of [1] for more details.
 
 .. _1:
+
 [1] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18, 1 (January 2017), 4671–4703.
 
diff --git a/doc/tutorials/tutorials.rst b/doc/tutorials/tutorials.rst
index d42d8eeaa..d86b58840 100644
--- a/doc/tutorials/tutorials.rst
+++ b/doc/tutorials/tutorials.rst
@@ -37,4 +37,4 @@ Mathematical details about the group Lasso, in particular with nonnegativity con
 :ref:`Critical regularization strength above which solution is 0 <alpha_max>`
 -----------------------------------------------------------------------------
 
-How to chose the regularization strength in L1-regularization?
+How to choose the regularization strength in L1-regularization?

From ec636d4ac9067e18c77fbe33be9b1d5b314e6310 Mon Sep 17 00:00:00 2001
From: wassimmazouz <wassimmazouz5261@gmail.com>
Date: Tue, 2 Jul 2024 16:59:03 +0200
Subject: [PATCH 6/7] try with eqnarray

---
 doc/tutorials/alpha_max.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/tutorials/alpha_max.rst b/doc/tutorials/alpha_max.rst
index f229d41a3..886215cf9 100644
--- a/doc/tutorials/alpha_max.rst
+++ b/doc/tutorials/alpha_max.rst
@@ -51,12 +51,14 @@ Thus,
 
 .. math::
 
+    \begin{eqnarray}
     0 \in \text{argmin} ~ g(x)
     &\Leftrightarrow 0 \in \partial g(0) \\
     &\Leftrightarrow
     0 \in \nabla f(0) + \lambda \partial || \cdot ||_1 (0) \\
     &\Leftrightarrow - \nabla f(0)  \in  \lambda \{ u \in \mathbb{R}^d : ||u||_{\infty} \leq 1 \} \\
     &\Leftrightarrow || \nabla f(0) ||_\infty \leq \lambda
+    \end{eqnarray}
 
 
 We have just shown that the minimizer of :math:`g = f + \lambda || \cdot ||_1` is 0 if and only if :math:`\lambda \geq ||\nabla f(0)||_{\infty}`.

From 181da37b5defe1660028b527125e4a028a72d26e Mon Sep 17 00:00:00 2001
From: Badr-MOUFAD <badr.moufad@emines.um6p.ma>
Date: Tue, 2 Jul 2024 17:06:41 +0200
Subject: [PATCH 7/7] fix math render

---
 doc/tutorials/alpha_max.rst | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/doc/tutorials/alpha_max.rst b/doc/tutorials/alpha_max.rst
index 886215cf9..8c105f87d 100644
--- a/doc/tutorials/alpha_max.rst
+++ b/doc/tutorials/alpha_max.rst
@@ -50,15 +50,18 @@ The subdifferential of :math:`|| x ||_1` at 0 is the L-infinity unit ball:
 Thus,
 
 .. math::
+    :nowrap:
 
-    \begin{eqnarray}
-    0 \in \text{argmin} ~ g(x)
-    &\Leftrightarrow 0 \in \partial g(0) \\
-    &\Leftrightarrow
-    0 \in \nabla f(0) + \lambda \partial || \cdot ||_1 (0) \\
-    &\Leftrightarrow - \nabla f(0)  \in  \lambda \{ u \in \mathbb{R}^d : ||u||_{\infty} \leq 1 \} \\
-    &\Leftrightarrow || \nabla f(0) ||_\infty \leq \lambda
-    \end{eqnarray}
+    \begin{equation}
+        \begin{aligned}
+        0 \in \text{argmin} ~ g(x)
+        &\Leftrightarrow 0 \in \partial g(0) \\
+        &\Leftrightarrow
+        0 \in \nabla f(0) + \lambda \partial || \cdot ||_1 (0) \\
+        &\Leftrightarrow - \nabla f(0)  \in  \lambda \{ u \in \mathbb{R}^d : ||u||_{\infty} \leq 1 \} \\
+        &\Leftrightarrow || \nabla f(0) ||_\infty \leq \lambda
+        \end{aligned}
+    \end{equation}
 
 
 We have just shown that the minimizer of :math:`g = f + \lambda || \cdot ||_1` is 0 if and only if :math:`\lambda \geq ||\nabla f(0)||_{\infty}`.
@@ -93,4 +96,3 @@ Refer to Section 3.1 and Proposition 4 in particular of [1] for more details.
 .. _1:
 
 [1] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18, 1 (January 2017), 4671–4703.
-