Merge pull request #178 from chhoumann/background-fix-scalers

Fix notation, formulas and cites in scalars in Background section
chhoumann · Jun 6, 2024 · c6e2767 · c6e2767
2 parents a7af7a4 + e4c4bae
commit c6e2767
Show file tree

Hide file tree

Showing 6 changed files with 28 additions and 20 deletions.
diff --git a/report_thesis/src/sections/background/preprocessing/index.tex b/report_thesis/src/sections/background/preprocessing/index.tex
@@ -1,7 +1,8 @@
 \subsection{Preprocessing}
 In this subsection, we discuss the preprocessing methods used in our machine learning pipeline.
-We cover various normalization techniques such as Z-score normalization, max absolute scaling, min-max normalization, robust scaling, Norm 3, power transformation, and quantile transformation.
+We cover the following normalization techniques: Z-score normalization, max absolute scaling, min-max normalization, robust scaling, Norm 3, power transformation, and quantile transformation.
 These techniques are essential for standardizing data, handling different scales, and improving the performance of machine learning models.
+For the purposes of this discussion, let $\mathbf{x}$ be a feature vector with values $x_1, x_2, \ldots, x_n$.
 
 \input{sections/background/preprocessing/z-score.tex}
 \input{sections/background/preprocessing/max_abs.tex}

diff --git a/report_thesis/src/sections/background/preprocessing/max_abs.tex b/report_thesis/src/sections/background/preprocessing/max_abs.tex
@@ -2,9 +2,11 @@ \subsubsection{Max Absolute Scaler}
 Max absolute scaling is a normalization technique that scales each feature individually so that the maximum absolute value of each feature is 1.
 This results in the data being normalized to a range between -1 and 1.
 The formula for max absolute scaling is given by:
+
 $$
-	X_{\text{scaled}} = \frac{x}{\max(|x|)},
+x'_i = \frac{x_i}{\max(|\mathbf{x}|)},
 $$
-where $x$ is the original feature value and $X_{\text{scaled}}$ is the normalized feature value.
-This scaling method is useful for data that has been centered at zero or data that is sparse, as max absolute scaling does not center the data.
-This maintains the sparsity of the data by not introducing non-zero values in the zero entries of the data~\cite{Vasques2024}.
+
+where $x_i$ is the original feature value, $\max(|\mathbf{x}|)$ is the maximum absolute value of the feature vector $\mathbf{x}$, and $x'_i$ is the normalized feature value.
+This scaling method is particularly useful for data that has been centered at zero or is sparse, as max absolute scaling does not alter the mean of the data.
+Additionally, it preserves the sparsity of the data by ensuring that zero entries remain zero, thereby not introducing any non-zero values~\cite{Vasques2024}.
diff --git a/report_thesis/src/sections/background/preprocessing/min-max.tex b/report_thesis/src/sections/background/preprocessing/min-max.tex
@@ -1,10 +1,12 @@
 \subsubsection{Min-Max Normalization}\label{subsec:min-max}
-Min-max normalization rescales the range of features to $[0, 1]$ or $[a, b]$, where $a$ and $b$ represent the new minimum and maximum values, respectively.
+Min-max normalization rescales the range of features to a specific range $[a, b]$, where $a$ and $b$ represent the new minimum and maximum values, respectively.
 The goal is to normalize the range of the data to a specific scale, typically 0 to 1.
-Mathematically, min-max normalization is defined as:
+The min-max normalization of a feature vector $\mathbf{x}$ is given by:
+
 $$
-	v' = \frac{v - \min(F)}{\max(F) - \min(F)} \times (b - a) + a,
+x'_i = \frac{x_i - \min(\mathbf{x})}{\max(\mathbf{x}) - \min(\mathbf{x})}(b - a) + a,
 $$
-where $v$ is the original value, $\min(F)$ and $\max(F)$ are the minimum and maximum values of the feature $F$, respectively.
 
-This type of normalization is beneficial because it ensures that each feature contributes equally to the analysis, regardless of its original scale.
+where $x_i$ is the original value, $\min(\mathbf{x})$ and $\max(\mathbf{x})$ are the minimum and maximum values of the feature vector $\mathbf{x}$, respectively, and $x'_i$ is the normalized feature value.
+
+This type of normalization is beneficial because it ensures that each feature contributes equally to the analysis, regardless of its original scale~\cite{dataminingConcepts}.
diff --git a/report_thesis/src/sections/background/preprocessing/norm3.tex b/report_thesis/src/sections/background/preprocessing/norm3.tex
@@ -13,7 +13,8 @@ \subsubsection{Norm 3}
 	\label{fig:spectral_plot}
 \end{figure}
 
-Formally, Norm 3 is defined as
+Let $\gamma$ represent the spectrometer index, where $\gamma \in \{1, 2, 3\}$, corresponding to the \gls{uv}, \gls{vio}, and \gls{vnir} spectrometers, respectively.
+Then, Norm 3 is formally defined as:
 
 \begin{equation}
 	\tilde{X}_{i,j}^{(\gamma)} = \frac{X_{i,j}^{(\gamma)}}{\sum_{j=1}^{N} X_{i,j}^{(\gamma)}},
@@ -22,7 +23,7 @@ \subsubsection{Norm 3}
 where
 
 \begin{itemize}
-	\item $\tilde{X}_{i,j}^{(\gamma)}$ is the normalized wavelength intensity for the $i$-th sample in the $j$-th channel on the $\gamma$-th spectrometer, with $\gamma \in \{1, 2, 3\}$ representing the \gls{uv}, \gls{vio}, and \gls{vnir} spectrometers, respectively,
+	\item $\tilde{X}_{i,j}^{(\gamma)}$ is the normalized wavelength intensity for the $i$-th sample in the $j$-th channel on the $\gamma$-th spectrometer,
 	\item $X_{i,j}^{(\gamma)}$ is the original wavelength intensity for the $i$-th sample in the $j$-th channel on the $\gamma$-th spectrometer, and
 	\item $N = 2048$ is the number of channels in each spectrometer.
 \end{itemize}

diff --git a/report_thesis/src/sections/background/preprocessing/robust_scaler.tex b/report_thesis/src/sections/background/preprocessing/robust_scaler.tex
@@ -1,8 +1,10 @@
 \subsubsection{Robust Scaler}
 The robust scaler is a normalization technique that removes the median and scales the data according to the quantile range.
-The formula for the robust scaler is given by:
+The robust scaler of a feature vector $\mathbf{x}$ is given by:
+
 $$
-X_{\text{scaled}} = \frac{X - \text{Q1}(X)}{\text{Q3}(X) - \text{Q1}(X)} \: ,
+x'_i = \frac{x_i - \text{Q1}(\mathbf{x})}{\text{Q3}(\mathbf{x}) - \text{Q1}(\mathbf{x})} \: ,
 $$
-where $X$ is the original data, $\text{Q1}(X)$ is the first quartile of $X$, and $\text{Q3}(X)$ is the third quartile of $X$.
+
+where $x_i$ is the original feature value, $\text{Q1}(\mathbf{x})$ is the first quartile of the feature vector $\mathbf{x}$, and $\text{Q3}(\mathbf{x})$ is the third quartile of the feature vector $\mathbf{x}$.
 This technique can be advantageous in cases where the data contains outliers, as it relies on the median and quantile range instead of the mean and variance, both of which are sensitive to outliers~\cite{Vasques2024}.
diff --git a/report_thesis/src/sections/background/preprocessing/z-score.tex b/report_thesis/src/sections/background/preprocessing/z-score.tex
@@ -1,12 +1,12 @@
 \subsubsection{Z-score Normalization}
-Z-score normalization, also standardization, transforms data to have a mean of zero and a standard deviation of one.
+Z-score normalization, also known as zero-mean normalization, transforms data to have a mean of zero and a standard deviation of one.
 This technique is useful when the actual minimum and maximum of a feature are unknown or when outliers may significantly skew the distribution.
-The formula for Z-score normalization is given by:
+The z-score normalization of a feature vector \(\mathbf{x}\) is given by:
 
 $$
-v' = \frac{v - \overline{F}}{\sigma_F},
+x'_i = \frac{x_i - \overline{\mathbf{x}}}{\sigma_\mathbf{x}},
 $$
 
-where $v$ is the original value, $\overline{F}$ is the mean of the feature $F$, and $\sigma_F$ is the standard deviation of $F$.
+where \(x_i\) is the original value, \(\overline{\mathbf{x}}\) is the mean of the feature vector \(\mathbf{x}\), \(\sigma_\mathbf{x}\) is the standard deviation of the feature vector \(\mathbf{x}\), and \(x'_i\) is the normalized feature value.
 By transforming the data using the Z-score, each value reflects its distance from the mean in terms of standard deviations.
-Z-score normalization is particularly advantageous in scenarios where data features have different units or scales, or when preparing data for algorithms that assume normally distributed inputs~\cite{dataminingConcepts}.
+Z-score normalization is particularly advantageous in scenarios where data features have different units or scales, or when preparing data for algorithms that assume normally distributed inputs~\cite{dataminingConcepts}.