Merge pull request #58 from chhoumann/experiments-summary

chhoumann · Jan 29, 2024 · 68f90b0 · 68f90b0
2 parents 1640cc6 + 5f93ed6
commit 68f90b0
Show file tree

Hide file tree

Showing 5 changed files with 84 additions and 21 deletions.
diff --git a/report_pre_thesis/src/references.bib b/report_pre_thesis/src/references.bib
@@ -397,3 +397,14 @@ @article{andersonPostlandingMajorElement2022
   abstract = {The SuperCam instrument on the Perseverance Mars 2020 rover uses a pulsed 1064~nm laser to ablate targets at a distance and conduct laser induced breakdown spectroscopy (LIBS) by analyzing the light from the resulting plasma. SuperCam LIBS spectra are preprocessed to remove ambient light, noise, and the continuum signal present in LIBS observations. Prior to quantification, spectra are masked to remove noisier spectrometer regions and spectra are normalized to minimize signal fluctuations and effects of target distance. In some cases, the spectra are also standardized or binned prior to quantification. To determine quantitative elemental compositions of diverse geologic materials at Jezero crater, Mars, we use a suite of 1198 laboratory spectra of 334 well-characterized reference samples. The samples were selected to span a wide range of compositions and include typical silicate rocks, pure minerals (e.g., silicates, sulfates, carbonates, oxides), more unusual compositions (e.g., Mn ore and sodalite), and replicates of the sintered SuperCam calibration targets (SCCTs) onboard the rover. For each major element (SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O), the database was subdivided into five “folds” with similar distributions of the element of interest. One fold was held out as an independent test set, and the remaining four folds were used to optimize multivariate regression models relating the spectrum to the composition. We considered a variety of models, and selected several for further investigation for each element, based primarily on the root mean squared error of prediction (RMSEP) on the test set, when analyzed at 3~m. In cases with several models of comparable performance at 3~m, we incorporated the SCCT performance at different distances to choose the preferred model. Shortly after landing on Mars and collecting initial spectra of geologic targets, we selected one model per element. Subsequently, with additional data from geologic targets, some models were revised to ensure results that are more consistent with geochemical constraints. The calibration discussed here is a snapshot of an ongoing effort to deliver the most accurate chemical compositions with SuperCam LIBS.},
   langid = {english},
 }
+
+
+@online{mars-sample-return
+	title = {Concepts for Mars Sample Return {\textbar} Missions},
+	url = {https://mars.nasa.gov/mars-exploration/missions/mars-sample-return},
+	abstract = {{NASA}'s real-time portal for Mars exploration, featuring the latest news, images, and discoveries from the Red Planet.},
+	titleaddon = {{NASA} Mars Exploration},
+	author = {mars.nasa.gov},
+	urldate = {2024-01-29},
+	langid = {english},
+}
diff --git a/report_pre_thesis/src/sections/experiments.tex b/report_pre_thesis/src/sections/experiments.tex
@@ -7,7 +7,7 @@ \section{Experiments}\label{sec:experiments}
 
 \begin{enumerate}
     \item Evaluating the necessity of automated outlier removal in the PLS1-SM component by comparing performance with and without this process.
-    \item Investigating the effect of fixed threshold values in the outlier removal process of PLS1-SM, maintaining these thresholds from the second iteration onwards.
+    \item Investigating the effect of maintaining the leverage and residuals in the outlier removal process of PLS1-SM from the second iteration onwards.
     \item Assessing the impact of the Median Absolute Deviation (MAD) method for outlier removal in the Independent Component Analysis (ICA) phase.
     \item Determining the effect on ICA performance when utilizing datasets from five locations compared to a single dataset.
     \item Comparing the performance of PLS1-SM and ICA models against alternative models, such as XGBoost and Artificial Neural Networks (ANN).
@@ -207,4 +207,26 @@ \subsection{Replication of the MOC Pipeline}\label{sec:replica_moc}
 \input{sections/experiments/pls_thresholds.tex}
 \input{sections/experiments/ica_outlier.tex}
 \input{sections/experiments/ica_aggregated.tex}
-\input{sections/experiments/other_models.tex}
+\input{sections/experiments/other_models.tex}
+
+\subsection{summary}\label{sec:experiments_summary}
+The experiments conducted in this section provide an assessment of the MOC pipeline's components, offering insights into their individual and collective impacts on the system's overall performance.
+
+Our findings indicate that outlier removal does not have a major impact on neither the PLS1-SM nor the ICA models, which is expected considering we anticipate a low number of outliers in the dataset, as mentioned in Section~\ref{ica_data_preprocessing}.
+Following this, we also found that maintaining the leverage and residuals from the second iteration of the outlier removal barely affects the PLS1-SM model's performance.
+
+We also found that the MAD method for outlier removal in the ICA phase improves the performance of the ICA models across all oxides except for \ce{CaO}.
+When using all five datasets for training the ICA models through aggregation, we found that the RMSEs vary across the oxides, with oxides that have a higher variance in their compositions having higher RMSEs and vice versa.
+
+Finally, we found that the XGBoost model performs exceptionally well, only being outperformed by the original MOC model on \ce{FeO_T}.
+The ANN model also shows promising results, often outperforming the original MOC model, and we expect that it could perform even better with a larger training dataset.
+
+We can draw several key conclusions from the experiments conducted in this section regarding the limitations of the MOC pipeline components:
+
+% TODO: Key conclusion here - what is the key limitations of the MOC pipeline components?
+% Note that PLS was more robust to outliers than ICA! The RMSEs for ICA were much higher than PLS when we did not do outlier removal.
+
+Finally, an aspect that is out of scope for this study is the inherent limitations stemming from the calibration dataset itself.
+The fact that it was acquired on Earth and not on Mars, and that it was acquired using a different LIBS instrument than the one used on Mars, causes a misalignment between the calibration dataset and the unseen data gathered by Mars rovers.
+In fact, NASA acknowledges this limitation, and is currently working on a mission to bring samples from Mars back to Earth, where they can be analyzed in laboratories around the world \cite{mars_sample_return}.
+This would allow for the creation of a calibration dataset that is more representative of the unseen data, which would allow for more accurate predictions.
diff --git a/report_pre_thesis/src/sections/experiments/ica_outlier.tex b/report_pre_thesis/src/sections/experiments/ica_outlier.tex
@@ -1,6 +1,36 @@
 \subsection{Experiment: ICA MAD Outlier Removal}\label{sec:experiment_ica_mad_outlier_removal}
 In the ICA phase, the original authors employed the Median Absolute Deviation (MAD) for outlier removal, yet the detailed methodology of their approach was not fully delineated.
 Consequently, in our version of the pipeline, we chose to exclude the outlier removal step during the ICA phase to avoid introducing unsubstantiated assumptions, as described in Section~\ref{sec:ica_data_preprocessing}.
-This decision allowed us to evaluate the intrinsic effectiveness of the ICA phase without outlier removal and assesses the impact of introducing MAD (Median Absolute Deviation) for outlier removal in our pipeline replication. 
-By comparing results with and without MAD, we aim to quantitatively determine its utility in reducing noise and improving data quality. 
-This will also provide insights into the robustness of the ICA phase against outliers, offering a comprehensive understanding of the pipeline's capabilities and limitations.
+This decision allowed us to evaluate the intrinsic effectiveness of the ICA phase without outlier removal and assesses the impact of introducing MAD (Median Absolute Deviation) for outlier removal in our pipeline replication.
+By comparing results with and without MAD, we aim to quantitatively determine its utility in reducing noise and improving data quality.
+This will also provide insights into the robustness of the ICA phase against outliers, offering a comprehensive understanding of the pipeline's capabilities and limitations.
+
+As mentioned in Section~\ref{sec:ica_data_preprocessing}, \citet{cleggRecalibrationMarsScience2017} did not specify the exact methodology of their outlier removal process.
+Therefore, we experimented with applying it at different stages of the ICA phase.
+The results presented in Table~\ref{tab:ica_mad_rmses} are the best results we obtained from these experiments, which were achieved by applying MAD before masking and normalization in the preprocessing phase.
+
+\begin{table}[h]
+\centering
+\begin{tabular}{lll}
+\hline
+Element    & ICA baseline   & ICA with MAD \\
+\hline
+\ce{SiO2}  & 10.68          & \textbf{8.64} \\
+\ce{TiO2}  & 0.63           & \textbf{0.53} \\
+\ce{Al2O3} & 5.55           & \textbf{3.69} \\
+\ce{FeO_T} & 8.30           & \textbf{7.07} \\
+\ce{MgO}   & 2.90           & \textbf{2.10} \\
+\ce{CaO}   & \textbf{3.52}  & 4.00 \\
+\ce{Na2O}  & 1.72           & \textbf{1.45} \\
+\ce{K2O}   & 1.37           & \textbf{1.15} \\
+\hline
+\end{tabular}
+\caption{RMSEs for the ICA phase's regression models with and without MAD-based outlier removal.}
+\label{tab:ica_mad_rmses}
+\end{table}
+
+As evident from Table~\ref{tab:ica_mad_rmses}, the ICA phase's performance is improved across all elements when MAD is applied except for $\ce{CaO}$.
+We hypothesize that this could be because the nature of the $\ce{CaO}$ data might differ from other elements, where outliers removed according to the MAD-based approach might be removing critical information, resulting in a less accurate model.
+
+It is also notable that the ICA regression models show an overall significant improvement when outlier removal is applied, while the experiment presented in Section~\ref{sec:experiment_pls_automated_outlier_removal} shows that omitting outlier removal in the PLS1-SM phase does not have a significant impact on the models' performance.
+This indicates that PLS is more robust to outliers than ICA.
diff --git a/report_pre_thesis/src/sections/experiments/pls_outlier.tex b/report_pre_thesis/src/sections/experiments/pls_outlier.tex
@@ -14,14 +14,14 @@ \subsection{Experiment: PLS Automated Outlier Removal}\label{sec:experiment_pls_
 \hline
 Element    & Baseline & Without outlier removal \\
 \hline
-\ce{SiO2}  & 5.81     & 5.81                    \\
-\ce{TiO2}  & 0.47     & 0.47                    \\
-\ce{Al2O3} & 1.94     & 1.91                    \\
-\ce{FeO_T} & 4.35     & 4.35                    \\
-\ce{MgO}   & 1.17     & 1.17                    \\
-\ce{CaO}   & 1.43     & 1.44                    \\
-\ce{Na2O}  & 0.66     & 0.67                    \\
-\ce{K2O}   & 0.72     & 0.70                    \\
+\ce{SiO2}  & \textbf{5.81}     & \textbf{5.81}                    \\
+\ce{TiO2}  & \textbf{0.47}     & \textbf{0.47}                    \\
+\ce{Al2O3} & 1.94              & \textbf{1.91}                    \\
+\ce{FeO_T} & \textbf{4.35}     & \textbf{4.35}                    \\
+\ce{MgO}   & \textbf{1.17}     & \textbf{1.17}                    \\
+\ce{CaO}   & \textbf{1.43}     & 1.44                    \\
+\ce{Na2O}  & \textbf{0.66}     & 0.67                    \\
+\ce{K2O}   & 0.72              & \textbf{0.70}                    \\
 \hline
 \end{tabular}
 \caption{RMSEs for the PLS1-SM model without automated outlier removal.}

diff --git a/report_pre_thesis/src/sections/experiments/pls_thresholds.tex b/report_pre_thesis/src/sections/experiments/pls_thresholds.tex
@@ -12,14 +12,14 @@ \subsection{Experiment: PLS Fixed Thresholds}\label{sec:experiment_pls_fixed_thr
 \hline
 Element    & Baseline      & Fixed thresholds \\
 \hline
-\ce{SiO2}  & 5.81          & 5.81  \\
-\ce{TiO2}  & 0.47          & 0.47  \\
-\ce{Al2O3} & 1.94          & 1.94  \\
-\ce{FeO_T} & 4.35          & 4.35  \\
-\ce{MgO}   & 1.17          & 1.18  \\
-\ce{CaO}   & 1.43          & 1.44  \\
-\ce{Na2O}  & 0.66          & 0.67  \\
-\ce{K2O}   & 0.72          & 0.72  \\
+\ce{SiO2}  & \textbf{5.81}          & \textbf{5.81}  \\
+\ce{TiO2}  & \textbf{0.47}          & \textbf{0.47}  \\
+\ce{Al2O3} & \textbf{1.94}          & \textbf{1.94}  \\
+\ce{FeO_T} & \textbf{4.35}          & \textbf{4.35}  \\
+\ce{MgO}   & \textbf{1.17}          & 1.18           \\
+\ce{CaO}   & \textbf{1.43}          & 1.44           \\
+\ce{Na2O}  & \textbf{0.66}          & 0.6            \\
+\ce{K2O}   & \textbf{0.72}          & \textbf{0.72}  \\
 \hline
 \end{tabular}
 \caption{RMSEs for the PLS1-SM model with fixed outlier removal thresholds.}