From 6e5a4975fcf2a1f936c13e408ecfaf96f46f5608 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 08:15:53 +0100
Subject: [PATCH 01/16] remove LIBS Setup section - won't use

---
 report_thesis/src/index.tex | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/report_thesis/src/index.tex b/report_thesis/src/index.tex
index d992c9fe..f291ebf8 100644
--- a/report_thesis/src/index.tex
+++ b/report_thesis/src/index.tex
@@ -24,10 +24,6 @@ \subsection{Related Work}
 
 Related Work (What others have done and why our method is different / novel)
 
-\subsection{LIBS Setup}
-Detailed explanation of the LIBS setup, including equipment, configurations, and settings.
-Explain any variables, controls, and calibrations involved in the setup.
-
 \subsection{Data Analysis}
 Description of the samples used and their relevance.
 Explain how and why these samples were chosen.

From 9003b8495db3fa9351d2cdeaf72c55b69d4319dd Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 08:16:04 +0100
Subject: [PATCH 02/16] add key terms to glossary

---
 report_thesis/src/glossary.tex | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/report_thesis/src/glossary.tex b/report_thesis/src/glossary.tex
index 94f8df23..8787f2e6 100644
--- a/report_thesis/src/glossary.tex
+++ b/report_thesis/src/glossary.tex
@@ -8,4 +8,6 @@
 \newacronym{ann}{ANN}{Artificial Neural Network}
 \newacronym{gbr}{GBR}{Gradient Boosting Regression}
 \newacronym{rf}{RF}{Random Forest}
-\newacronym{lasso}{LASSO}{Least Absolute Selection and Shrinkage Operator}
\ No newline at end of file
+\newacronym{lasso}{LASSO}{Least Absolute Selection and Shrinkage Operator}
+\newacronym{pca}{PCA}{Principal Component Analysis}
+\newacronym{rmse}{RMSE}{Root Mean Squared Error}
\ No newline at end of file

From 91493667dc93b3d20f5fa6a6444f63b52000545b Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 08:16:28 +0100
Subject: [PATCH 03/16] create and add empty problem definition section

---
 report_thesis/src/index.tex                       | 1 +
 report_thesis/src/sections/problem_definition.tex | 3 +++
 2 files changed, 4 insertions(+)
 create mode 100644 report_thesis/src/sections/problem_definition.tex

diff --git a/report_thesis/src/index.tex b/report_thesis/src/index.tex
index f291ebf8..dab48843 100644
--- a/report_thesis/src/index.tex
+++ b/report_thesis/src/index.tex
@@ -11,6 +11,7 @@
 \subsubsection*{Acknowledgements:}
 
 \input{sections/introduction.tex}
+\input{sections/problem_definition.tex}
 
 \section{Background}
 Background / Preliminaries (what you need to know in order to understand the story)
diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
new file mode 100644
index 00000000..de52fa14
--- /dev/null
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -0,0 +1,3 @@
+\section{Problem Definition}
+\label{sec:problem_definition}
+

From 481b451ff1fff40bfc9fd5bdfd9b1a8211b558b5 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 10:24:48 +0100
Subject: [PATCH 04/16] write problem definition draft

---
 .../src/sections/problem_definition.tex       | 51 +++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index de52fa14..62500929 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -1,3 +1,54 @@
 \section{Problem Definition}
 \label{sec:problem_definition}
 
+Our research aims to improve the accuracy and robustness of major oxide predictions derived from \gls{libs} data, building upon the baseline established in \cite{p9_paper}.
+There are many challenges in predicting major oxides from \gls{libs} data, including the high dimensionality and non-linearity of the data, as well as the presence of multicollinearity.
+Some of these are caused by \textit{matrix effects}\cite{andersonImprovedAccuracyQuantitative2017}, which is a catch-all term for any effect that can cause the intensity of emission lines from an element to vary, independent of that element's concentration. So it's unknown variables that affect the results.
+Furthermore, due to the high cost of data collection, datasets are often small, which further complicates the task of building accurate and robust models.
+
+Based on the limitations with the current MOC pipeline, as reported in \cite{p9_paper}, we identified three key areas for further investigation: dimensionality reduction, model selection, and outlier removal.
+
+In this work, we focus on dimensionality reduction and model selection over outlier removal.
+This is justified by the low incidence of outliers in the ChemCam \gls{libs} calibration dataset, as reported in \cite{p9_paper}.
+Dimensionality reduction is crucial for managing the high-dimensional nature of \gls{libs} data. % TODO: There are lots of related works which explore DR in LIBS data. We can back this up with citations.
+Furthermore, model selection shows promise in addressing the limitations of the current MOC pipeline, as it allows for the exploration of a wider range of algorithms, potentially leading to improved performance.
+We showed that advanced ensemble methods, such as \gls{gbr}, and deep learning models, such as \gls{ann}, have the potential to outperform the current MOC pipeline.
+Methods are selected based on their promise in handling high-dimensional, non-linear data. Ideally, the selected methods should also be feasible for small datasets, a common scenario in \gls{libs} analyses. 
+In order to address the aforementioned challenges, we propose to explore advanced ensemble methods and deep learning models, which have shown promise in handling high-dimensional, non-linear data.
+
+It is necessary to establish metrics to evaluate the performance of the models.
+In \cite{p9_paper}, we proposed to use the \gls{rmse} as a proxy for accuracy.
+
+\gls{rmse} is given by:
+
+\begin{equation}
+    RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
+\end{equation}
+
+where $y_i$ represents the actual values, $\hat{y}_i$ the predicted values, and $n$ the number of observations.
+
+To address robustness, we propose considering the standard deviation of prediction errors across each oxide and test instance, defined as:
+
+\begin{equation}
+    \sigma_{error} = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (e_i - \bar{e})^2}
+\end{equation}
+
+where $e_i = y_i - \hat{y}_i$ and $\bar{e}$ is the mean error.
+
+In order to narrow down the scope of our research, we set the following constraints:
+\begin{itemize}
+    \item Prioritize normalization across individual spectrometers' wavelength ranges over full-spectrum normalization.
+    \item Focus on techniques proven effective for non-linear, high-dimensional data, even outside the \gls{libs} context.
+    \item Ensure methods are feasible for small datasets.
+\end{itemize}
+
+In \cite{p9_paper}, we used both full-spectrum normalization (Norm 1) and normalization across individual spectrometers' wavelength ranges (Norm 3).
+However, in this work, we opt for normalizing across individual spectrometers' wavelength ranges.
+This decision is guided by the operational parameters of SuperCam\cite{andersonPostLandingMajorElements2022}, where they do not normalize across the entire spectrum, but rather across individual spectrometers' wavelength ranges.
+In order to ensure the future applicability of our methods, we follow the same normalization approach.
+
+Our methodologies are selected to ensure compatibility with small datasets, a common scenario in \gls{libs} analyses.
+This consideration informs our preference for cross-validation techniques, which offer robust performance assessments without necessitating large data volumes.
+Additionally, the computational complexity of models is deemed a secondary concern, given the scale of data typically involved in \gls{libs} studies, allowing for a broader exploration of sophisticated, potentially more computationally intensive algorithms.
+
+% TODO: Write tail when we have more structure in the report

From 69af1014dbf652e19ae9a13770a7e1ba4c35c58e Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 12:03:46 +0100
Subject: [PATCH 05/16] Update
 report_thesis/src/sections/problem_definition.tex

Co-authored-by: Pattrigue <57709490+Pattrigue@users.noreply.github.com>
---
 report_thesis/src/sections/problem_definition.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index 62500929..903ee3bc 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -1,7 +1,7 @@
 \section{Problem Definition}
 \label{sec:problem_definition}
 
-Our research aims to improve the accuracy and robustness of major oxide predictions derived from \gls{libs} data, building upon the baseline established in \cite{p9_paper}.
+Our work aims to improve the accuracy and robustness of major oxide predictions derived from \gls{libs} data, building upon the baseline established in \citet{p9_paper}.
 There are many challenges in predicting major oxides from \gls{libs} data, including the high dimensionality and non-linearity of the data, as well as the presence of multicollinearity.
 Some of these are caused by \textit{matrix effects}\cite{andersonImprovedAccuracyQuantitative2017}, which is a catch-all term for any effect that can cause the intensity of emission lines from an element to vary, independent of that element's concentration. So it's unknown variables that affect the results.
 Furthermore, due to the high cost of data collection, datasets are often small, which further complicates the task of building accurate and robust models.

From cc053e067063292f6e9ac0ac90ae38cdc925dcf9 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 12:04:38 +0100
Subject: [PATCH 06/16] Update
 report_thesis/src/sections/problem_definition.tex

Co-authored-by: Pattrigue <57709490+Pattrigue@users.noreply.github.com>
---
 report_thesis/src/sections/problem_definition.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index 903ee3bc..26ac13be 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -6,7 +6,7 @@ \section{Problem Definition}
 Some of these are caused by \textit{matrix effects}\cite{andersonImprovedAccuracyQuantitative2017}, which is a catch-all term for any effect that can cause the intensity of emission lines from an element to vary, independent of that element's concentration. So it's unknown variables that affect the results.
 Furthermore, due to the high cost of data collection, datasets are often small, which further complicates the task of building accurate and robust models.
 
-Based on the limitations with the current MOC pipeline, as reported in \cite{p9_paper}, we identified three key areas for further investigation: dimensionality reduction, model selection, and outlier removal.
+Based on the limitations with the current \gls{moc} pipeline, as reported in \citet{p9_paper}, we identified three key areas for further investigation: dimensionality reduction, model selection, and outlier removal.
 
 In this work, we focus on dimensionality reduction and model selection over outlier removal.
 This is justified by the low incidence of outliers in the ChemCam \gls{libs} calibration dataset, as reported in \cite{p9_paper}.

From 99bc1f1864783e7dceedb8f8db26c4914e7c3e67 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 12:04:45 +0100
Subject: [PATCH 07/16] Update
 report_thesis/src/sections/problem_definition.tex

Co-authored-by: Pattrigue <57709490+Pattrigue@users.noreply.github.com>
---
 report_thesis/src/sections/problem_definition.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index 26ac13be..e308a00c 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -9,7 +9,7 @@ \section{Problem Definition}
 Based on the limitations with the current \gls{moc} pipeline, as reported in \citet{p9_paper}, we identified three key areas for further investigation: dimensionality reduction, model selection, and outlier removal.
 
 In this work, we focus on dimensionality reduction and model selection over outlier removal.
-This is justified by the low incidence of outliers in the ChemCam \gls{libs} calibration dataset, as reported in \cite{p9_paper}.
+This is justified by the low incidence of outliers in the \gls{chemcam} \gls{libs} calibration dataset, as reported in \citet{p9_paper}.
 Dimensionality reduction is crucial for managing the high-dimensional nature of \gls{libs} data. % TODO: There are lots of related works which explore DR in LIBS data. We can back this up with citations.
 Furthermore, model selection shows promise in addressing the limitations of the current MOC pipeline, as it allows for the exploration of a wider range of algorithms, potentially leading to improved performance.
 We showed that advanced ensemble methods, such as \gls{gbr}, and deep learning models, such as \gls{ann}, have the potential to outperform the current MOC pipeline.

From d4f212be03001a3c8de336991f98e63f9258ac2e Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 12:04:52 +0100
Subject: [PATCH 08/16] Update
 report_thesis/src/sections/problem_definition.tex

Co-authored-by: Pattrigue <57709490+Pattrigue@users.noreply.github.com>
---
 report_thesis/src/sections/problem_definition.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index e308a00c..cd2a7a58 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -11,7 +11,7 @@ \section{Problem Definition}
 In this work, we focus on dimensionality reduction and model selection over outlier removal.
 This is justified by the low incidence of outliers in the \gls{chemcam} \gls{libs} calibration dataset, as reported in \citet{p9_paper}.
 Dimensionality reduction is crucial for managing the high-dimensional nature of \gls{libs} data. % TODO: There are lots of related works which explore DR in LIBS data. We can back this up with citations.
-Furthermore, model selection shows promise in addressing the limitations of the current MOC pipeline, as it allows for the exploration of a wider range of algorithms, potentially leading to improved performance.
+Furthermore, model selection shows promise in addressing the limitations of the current \gls{moc} pipeline, as it allows for the exploration of a wider range of algorithms, potentially leading to improved performance.
 We showed that advanced ensemble methods, such as \gls{gbr}, and deep learning models, such as \gls{ann}, have the potential to outperform the current MOC pipeline.
 Methods are selected based on their promise in handling high-dimensional, non-linear data. Ideally, the selected methods should also be feasible for small datasets, a common scenario in \gls{libs} analyses. 
 In order to address the aforementioned challenges, we propose to explore advanced ensemble methods and deep learning models, which have shown promise in handling high-dimensional, non-linear data.

From f648d4c764f23a74ce052ae16d604a27a407fc67 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 12:06:48 +0100
Subject: [PATCH 09/16] Update
 report_thesis/src/sections/problem_definition.tex

Co-authored-by: Pattrigue <57709490+Pattrigue@users.noreply.github.com>
---
 report_thesis/src/sections/problem_definition.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index cd2a7a58..6c8d1fdc 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -22,7 +22,7 @@ \section{Problem Definition}
 \gls{rmse} is given by:
 
 \begin{equation}
-    RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
+    RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2},
 \end{equation}
 
 where $y_i$ represents the actual values, $\hat{y}_i$ the predicted values, and $n$ the number of observations.

From 5cd2c115757e7faa8bceb8ab0d80fd2ed00f3d21 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 12:06:55 +0100
Subject: [PATCH 10/16] Update
 report_thesis/src/sections/problem_definition.tex

Co-authored-by: Pattrigue <57709490+Pattrigue@users.noreply.github.com>
---
 report_thesis/src/sections/problem_definition.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index 6c8d1fdc..930faee0 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -30,7 +30,7 @@ \section{Problem Definition}
 To address robustness, we propose considering the standard deviation of prediction errors across each oxide and test instance, defined as:
 
 \begin{equation}
-    \sigma_{error} = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (e_i - \bar{e})^2}
+    \sigma_{error} = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (e_i - \bar{e})^2},
 \end{equation}
 
 where $e_i = y_i - \hat{y}_i$ and $\bar{e}$ is the mean error.

From e62b0ad47158a52663b3042afa145d6b24d724ea Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 12:09:37 +0100
Subject: [PATCH 11/16] Update
 report_thesis/src/sections/problem_definition.tex

---
 report_thesis/src/sections/problem_definition.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index 930faee0..e6d44deb 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -12,7 +12,7 @@ \section{Problem Definition}
 This is justified by the low incidence of outliers in the \gls{chemcam} \gls{libs} calibration dataset, as reported in \citet{p9_paper}.
 Dimensionality reduction is crucial for managing the high-dimensional nature of \gls{libs} data. % TODO: There are lots of related works which explore DR in LIBS data. We can back this up with citations.
 Furthermore, model selection shows promise in addressing the limitations of the current \gls{moc} pipeline, as it allows for the exploration of a wider range of algorithms, potentially leading to improved performance.
-We showed that advanced ensemble methods, such as \gls{gbr}, and deep learning models, such as \gls{ann}, have the potential to outperform the current MOC pipeline.
+We showed that advanced ensemble methods like \gls{gbr} and deep learning models like \gls{ann}s have the potential to outperform the current \gls{moc} pipeline.
 Methods are selected based on their promise in handling high-dimensional, non-linear data. Ideally, the selected methods should also be feasible for small datasets, a common scenario in \gls{libs} analyses. 
 In order to address the aforementioned challenges, we propose to explore advanced ensemble methods and deep learning models, which have shown promise in handling high-dimensional, non-linear data.
 

From 5e8dbe220d9d0b4c2e46f240cd2ddb1c12a3144c Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 12:17:09 +0100
Subject: [PATCH 12/16] fix cite

---
 report_thesis/src/sections/problem_definition.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index 62500929..363be150 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -44,7 +44,7 @@ \section{Problem Definition}
 
 In \cite{p9_paper}, we used both full-spectrum normalization (Norm 1) and normalization across individual spectrometers' wavelength ranges (Norm 3).
 However, in this work, we opt for normalizing across individual spectrometers' wavelength ranges.
-This decision is guided by the operational parameters of SuperCam\cite{andersonPostLandingMajorElements2022}, where they do not normalize across the entire spectrum, but rather across individual spectrometers' wavelength ranges.
+This decision is guided by the operational parameters of SuperCam\cite{andersonPostlandingMajorElement2022}, where they do not normalize across the entire spectrum, but rather across individual spectrometers' wavelength ranges.
 In order to ensure the future applicability of our methods, we follow the same normalization approach.
 
 Our methodologies are selected to ensure compatibility with small datasets, a common scenario in \gls{libs} analyses.

From edf9a6afdc51f26b1df774462ca84ed2f24eb7b5 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 12:17:32 +0100
Subject: [PATCH 13/16] Update
 report_thesis/src/sections/problem_definition.tex

Co-authored-by: Pattrigue <57709490+Pattrigue@users.noreply.github.com>
---
 report_thesis/src/sections/problem_definition.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index 66cfde5a..8097489f 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -43,7 +43,7 @@ \section{Problem Definition}
 \end{itemize}
 
 In \cite{p9_paper}, we used both full-spectrum normalization (Norm 1) and normalization across individual spectrometers' wavelength ranges (Norm 3).
-However, in this work, we opt for normalizing across individual spectrometers' wavelength ranges.
+However, in this work, we opt to always normalize across individual spectrometers' wavelength ranges (Norm 3).
 This decision is guided by the operational parameters of SuperCam\cite{andersonPostlandingMajorElement2022}, where they do not normalize across the entire spectrum, but rather across individual spectrometers' wavelength ranges.
 In order to ensure the future applicability of our methods, we follow the same normalization approach.
 

From 791d5058d087f180bc49cc006c0d7ada39e72cd7 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 12:19:24 +0100
Subject: [PATCH 14/16] clarify norms in itemize

---
 report_thesis/src/sections/problem_definition.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index 66cfde5a..5f7e0457 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -37,7 +37,7 @@ \section{Problem Definition}
 
 In order to narrow down the scope of our research, we set the following constraints:
 \begin{itemize}
-    \item Prioritize normalization across individual spectrometers' wavelength ranges over full-spectrum normalization.
+    \item Prioritize normalization across individual spectrometers' wavelength ranges (Norm 3) over full-spectrum normalization (Norm 1).
     \item Focus on techniques proven effective for non-linear, high-dimensional data, even outside the \gls{libs} context.
     \item Ensure methods are feasible for small datasets.
 \end{itemize}

From 4951a44b52292569bafad632039948b37585f249 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 13:08:13 +0100
Subject: [PATCH 15/16] remove 'operational parameters'

---
 report_thesis/src/sections/problem_definition.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index 4ec5a96d..f521288e 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -44,7 +44,7 @@ \section{Problem Definition}
 
 In \cite{p9_paper}, we used both full-spectrum normalization (Norm 1) and normalization across individual spectrometers' wavelength ranges (Norm 3).
 However, in this work, we opt to always normalize across individual spectrometers' wavelength ranges (Norm 3).
-This decision is guided by the operational parameters of SuperCam\cite{andersonPostlandingMajorElement2022}, where they do not normalize across the entire spectrum, but rather across individual spectrometers' wavelength ranges.
+This decision is guided by the approach taken by the SuperCam team, where they do not normalize across the entire spectrum, but rather across individual spectrometers' wavelength ranges\cite{andersonPostlandingMajorElement2022}.
 In order to ensure the future applicability of our methods, we follow the same normalization approach.
 
 Our methodologies are selected to ensure compatibility with small datasets, a common scenario in \gls{libs} analyses.

From 43a916b1498a92408a556907a4038200280c64b6 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Thu, 22 Feb 2024 13:08:22 +0100
Subject: [PATCH 16/16] remove last part

---
 report_thesis/src/sections/problem_definition.tex | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/report_thesis/src/sections/problem_definition.tex b/report_thesis/src/sections/problem_definition.tex
index f521288e..7313d8b0 100644
--- a/report_thesis/src/sections/problem_definition.tex
+++ b/report_thesis/src/sections/problem_definition.tex
@@ -47,8 +47,4 @@ \section{Problem Definition}
 This decision is guided by the approach taken by the SuperCam team, where they do not normalize across the entire spectrum, but rather across individual spectrometers' wavelength ranges\cite{andersonPostlandingMajorElement2022}.
 In order to ensure the future applicability of our methods, we follow the same normalization approach.
 
-Our methodologies are selected to ensure compatibility with small datasets, a common scenario in \gls{libs} analyses.
-This consideration informs our preference for cross-validation techniques, which offer robust performance assessments without necessitating large data volumes.
-Additionally, the computational complexity of models is deemed a secondary concern, given the scale of data typically involved in \gls{libs} studies, allowing for a broader exploration of sophisticated, potentially more computationally intensive algorithms.
-
 % TODO: Write tail when we have more structure in the report