From f87d686fe1e53f4793ed6fdfeaac3cf03af39c2e Mon Sep 17 00:00:00 2001
From: "Adam J. Stewart" <ajstewart426@gmail.com>
Date: Thu, 19 Dec 2024 19:58:01 +0100
Subject: [PATCH] Docs: remove mathcal (#2467)

---
 docs/tutorials/geospatial.ipynb |  2 +-
 docs/tutorials/pytorch.ipynb    | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/tutorials/geospatial.ipynb b/docs/tutorials/geospatial.ipynb
index 844ce5b8747..880ad1ab998 100644
--- a/docs/tutorials/geospatial.ipynb
+++ b/docs/tutorials/geospatial.ipynb
@@ -126,7 +126,7 @@
     "\n",
     "Similar to radar, lidar is another active remote sensing method that replaces microwave pulses with lasers. By measuring the time it takes light to reflect off of an object and return to the sensor, we can generate a 3D point cloud mapping object structures. Mathematically, our dataset would then become:\n",
     "\n",
-    "$$\\mathcal{D} = \\left\\{\\left(x^{(i)}, y^{(i)}, z^{(i)}\\right)\\right\\}_{i=1}^N$$\n",
+    "$$D = \\left\\{\\left(x^{(i)}, y^{(i)}, z^{(i)}\\right)\\right\\}_{i=1}^N$$\n",
     "\n",
     "This technology is frequently used in several different application domains:\n",
     "\n",
diff --git a/docs/tutorials/pytorch.ipynb b/docs/tutorials/pytorch.ipynb
index db7bcdb4c88..b66d3cc95b8 100644
--- a/docs/tutorials/pytorch.ipynb
+++ b/docs/tutorials/pytorch.ipynb
@@ -115,9 +115,9 @@
     "\n",
     "In order to learn by example, we first need examples. In machine learning, we construct datasets of the form:\n",
     "\n",
-    "$$\\mathcal{D} = \\left\\{\\left(x^{(i)}, y^{(i)}\\right)\\right\\}_{i=1}^N$$\n",
+    "$$D = \\left\\{\\left(x^{(i)}, y^{(i)}\\right)\\right\\}_{i=1}^N$$\n",
     "\n",
-    "Written in English, dataset $\\mathcal{D}$ is composed of $N$ pairs of inputs $x$ and expected outputs $y$. $x$ and $y$ can be tabular data, images, text, or any other object that can be represented mathematically.\n",
+    "Written in English, dataset $D$ is composed of $N$ pairs of inputs $x$ and expected outputs $y$. $x$ and $y$ can be tabular data, images, text, or any other object that can be represented mathematically.\n",
     "\n",
     "![EuroSAT](https://github.com/phelber/EuroSAT/blob/master/eurosat-overview.png?raw=true)\n",
     "\n",
@@ -261,11 +261,11 @@
     "\n",
     "If $y$ is our expected output (also called \"ground truth\") and $\\hat{y}$ is our predicted output, our goal is to minimize the difference between $y$ and $\\hat{y}$. This difference is referred to as *error* or *loss*, and the loss function tells us how big of a mistake we made. For regression tasks, a simple mean squared error is sufficient:\n",
     "\n",
-    "$$\\mathcal{L}(y, \\hat{y}) = \\left(y - \\hat{y}\\right)^2$$\n",
+    "$$L(y, \\hat{y}) = \\left(y - \\hat{y}\\right)^2$$\n",
     "\n",
     "For classification tasks, such as EuroSAT, we instead use a negative log-likelihood:\n",
     "\n",
-    "$$\\mathcal{L}_c(y, \\hat{y}) = - \\sum_{c=1}^C \\mathbb{1}_{y=\\hat{y}}\\log{p_c}$$\n",
+    "$$L_c(y, \\hat{y}) = - \\sum_{c=1}^C \\mathbb{1}_{y=\\hat{y}}\\log{p_c}$$\n",
     "\n",
     "where $\\mathbb{1}$ is the indicator function and $p_c$ is the probability with which the model predicts class $c$. By normalizing this over the log probability of all classes, we get the cross-entropy loss."
    ]
@@ -289,7 +289,7 @@
     "\n",
     "In order to minimize our loss, we compute the gradient of the loss function with respect to model parameters $\\theta$. We then take a small step $\\alpha$ (also called the *learning rate*) in the direction of the negative gradient to update our model parameters in a process called *backpropagation*:\n",
     "\n",
-    "$$\\theta \\leftarrow \\theta - \\alpha \\nabla_\\theta \\mathcal{L}(y, \\hat{y})$$\n",
+    "$$\\theta \\leftarrow \\theta - \\alpha \\nabla_\\theta L(y, \\hat{y})$$\n",
     "\n",
     "When done one image or one mini-batch at a time, this is known as *stochastic gradient descent* (SGD)."
    ]