Skip to content

Commit

Permalink
Merge pull request #27 from chhoumann/Report-corrections
Browse files Browse the repository at this point in the history
Moved background section into introduction
  • Loading branch information
Ivikhostrup authored Jan 11, 2024
2 parents 1e503c5 + 110731d commit d65cbcf
Show file tree
Hide file tree
Showing 8 changed files with 108 additions and 32 deletions.
2 changes: 2 additions & 0 deletions report_pre_thesis/src/_preamble.tex
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
\documentclass[acmtog]{acmart}
\usepackage{natbib}
\usepackage{todonotes}
\usepackage{lineno}
\linenumbers

\title{Identifying Limitations in the ChemCam Multivariate Oxide Composition Model for Elemental Quantification in Martian Geological Samples}
\author{Christian Bager Bach Houmann}
Expand Down
2 changes: 1 addition & 1 deletion report_pre_thesis/src/index.tex
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@

\input{sections/introduction}
\input{sections/background}
\input{sections/related_work}
\input{sections/definition}
\input{sections/related_work}
\input{sections/data_analysis}
\input{sections/methodology}
\input{sections/results}
Expand Down
31 changes: 31 additions & 0 deletions report_pre_thesis/src/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -252,3 +252,34 @@ @article{hu_review_2022
date = {2022-07},
langid = {english}
}
@book{marini_chemometrics_2013,
title = {Chemometrics in Food Chemistry},
isbn = {978-0-444-59529-4},
pages = {154},
url = {https://books.google.dk/books?id=HWUxgQ56stMC},
series = {{ISSN}},
publisher = {Elsevier Science},
author = {Marini, F.},
date = {2013},
}
@article{brereton_chi_2015,
title = {The chi squared and multinormal distributions},
volume = {29},
issn = {0886-9383, 1099-128X},
url = {https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/cem.2680},
doi = {10.1002/cem.2680},
pages = {9--12},
number = {1},
journaltitle = {Journal of Chemometrics},
shortjournal = {Journal of Chemometrics},
author = {Brereton, Richard G.},
urldate = {2023-12-18},
date = {2015-01},
langid = {english},
file = {Brereton - 2015 - The chi squared and multinormal distributions.pdf:C\:\\Users\\Ivik Hostrup\\Zotero\\storage\\NXNPY92I\\Brereton - 2015 - The chi squared and multinormal distributions.pdf:application/pdf},
}
18 changes: 10 additions & 8 deletions report_pre_thesis/src/sections/background.tex
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
\section{Background}\label{sec:background}
The use of LIBS technology in planetary exploration has proven to be effective in analyzing soil and rock samples \citep{knight2000}.
%The use of LIBS technology in planetary exploration has proven to be effective in analyzing soil and rock samples \citep{knight2000}.

%A laser pulses to ablate and remove any surface contaminants, such as dust and weathering layers, to expose the underlying material.
%The laser generates a plasma plume from the now-exposed sample material.
%This plasma plume emits light, which, when collected and analyzed, reveals the elemental composition of the sample by correlating the intensity of emitted light with specific wavelengths in a LIBS spectrum.
%The LIBS technique enables remote analysis of materials without the need for sample preparation.
%It allows for rapid analysis because of the immediate spectrum collection from the subsequent plasma, while maintaining a high spatial resolution due to its small observation footprints.
%This high resolution is essential for pinpointing and investigating small features. \cite{wiensChemcam2012}


A laser pulses to ablate and remove any surface contaminants, such as dust and weathering layers, to expose the underlying material.
The laser generates a plasma plume from the now-exposed sample material.
This plasma plume emits light, which, when collected and analyzed, reveals the elemental composition of the sample by correlating the intensity of emitted light with specific wavelengths in a LIBS spectrum.
The LIBS technique enables remote analysis of materials without the need for sample preparation.
It allows for rapid analysis because of the immediate spectrum collection from the subsequent plasma, while maintaining a high spatial resolution due to its small observation footprints.
This high resolution is essential for pinpointing and investigating small features. \cite{wiensChemcam2012}

% In 2013, \citet{wiensPreFlight3} published a paper describing the pre-flight calibration and initial data processing for the ChemCam LIBS instrument.
% This paper introduces methods for preprocessing spectra samples and a regression model based on Partial Least Squares (PLS2) used to predict the composition of geological samples on Mars.
Expand All @@ -25,5 +27,5 @@ \section{Background}\label{sec:background}
% This model is referred to as the Multivariate Oxide Composition (MOC) model.
% The MOC model is currently used by the ChemCam team to analyze the LIBS data collected by the Curiosity rover.

\input{sections/known_limitations}
%\input{sections/known_limitations}
\input{sections/moc}
8 changes: 5 additions & 3 deletions report_pre_thesis/src/sections/definition.tex
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@ \section{Definition}\label{sec:definition}
\end{definition}

\begin{definition}\label{def:hypothesis_function}
Given a set of major oxides $O$ where $k=|O|$, define a model $M$ that learns a hypothesis function $f: \Lambda \times \mathbb{R}^m \rightarrow \mathbb{R}^k$ to predict the composition of the $k$ major oxides in geological samples.
The output of the hypothesis function is a vector $\mathbf{\hat{y}} = [\hat{y}_{1}, \hat{y}_{2}, \ldots, \hat{y}_{8}]$ where $\hat{y}_{i}$ is the predicted weight percentage of the major oxide $o_i \in O$.
Given a set of major oxides \(O\) where \(k=|O|\), define a model \(M\) that learns a hypothesis function \(f: \Lambda \times \mathbb{R}^m \rightarrow \mathbb{R}^k\), using the dataset \(D\), defined in Definition \ref{def:dataset}, as input.
This input, comprising wavelengths and intensity values, is used to predict the composition of the \(k\) major oxides in geological samples.
The output of the hypothesis function is a vector \(\mathbf{\hat{y}} = [\hat{y}_{1}, \hat{y}_{2}, \ldots, \hat{y}_{8}]\) where \(\hat{y}_{i}\) is the predicted weight percentage of the major oxide \(o_i \in O\).
\end{definition}

The sum of the predicted weight percentages is not necessarily equal to 100\%, but is not expected to surpass 100\%.
The samples may contain other elements that are not considered major oxides, which would account for the difference.
If the sum of the predicted weight percentages is greater than 100\%, the model is overestimating the weight percentages, and represents a physical impossibility.
Expand All @@ -34,4 +36,4 @@ \section{Definition}\label{sec:definition}

This leads us to the following challenge:

\textbf{Problem}: Given a series of experiments and the resulting models, identify the components that contribute the most to the overall error $E(M)$.
\textbf{Problem}: Given a series of experiments and the resulting models, identify the components that contribute the most to the overall error $E(M)$.
23 changes: 17 additions & 6 deletions report_pre_thesis/src/sections/introduction.tex
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,16 @@ \section{Introduction}\label{sec:introduction}
The primary interest are rocks that either formed in water or show indications of organic materials.\cite{chemcamNasaWebsite}

ChemCam is a remote-sensing laser instrument developed by NASA in collaboration with the French national space agency (CNES).
The instrument is used to gather Laser-Induced Breakdown Spectroscopy (LIBS) data from geological samples on Mars. The data itself consists of a series of spectral readings, each forming a spectrum. These spectra represent the emitted light from a plasma created when the laser interacts with the target sample. Captured over a range of wavelengths, each spectrum is composed of various emission lines. Each emission line is associated with a specific element, and its intensity reflects the concentration of that element in the sample. Consequently, the collection of spectra serves as a complex, multi-dimensional fingerprint of the elemental composition of the examined geological formations.
The instrument is used to gather Laser-Induced Breakdown Spectroscopy (LIBS) data from geological samples on Mars.

The laser pulses to ablate and remove any surface contaminants, such as dust and weathering layers, to expose the underlying material. The laser generates a plasma plume from the now-exposed sample material.
This plasma plume emits light, and the data collected from this process consists of a series of spectral readings. Captured over a range of wavelengths, each spectrum is composed of various emission lines. Each emission line is associated with a specific element, and its intensity reflects the concentration of that element in the sample.
Consequently, the collection of spectra serves as a complex, multi-dimensional fingerprint of the elemental composition of the examined geological formations.
This data is then used to determine the elemental composition of these samples.\cite{cleggRecalibrationMarsScience2017}

LIBS is a versatile analytical tool with broad applicability across various fields. In environmental monitoring, its spectral data are effectively used with machine learning and statistical methods like partial least squares and neural networks for detecting and quantifying soil pollutants. In industrial contexts, it is also utilized for quality control processes involving metals and alloys\cite{huang_progress_2023}.
Due to its capability for remote analysis, LIBS enables processesing of materials without needing sample preparation. This enables rapid analysis because of the immediate spectrum collection from the subsequent plasma. It does this while maintaining high spatial resolution due to its small observation footprints. This high resolution is essential for pinpointing and investigating small features. \cite{wiensChemcam2012}

LIBS is a versatile analytical tool with broad applicability across various other fields. In environmental monitoring, its spectral data are effectively used with machine learning and statistical methods like partial least squares and neural networks for detecting and quantifying soil pollutants. In industrial contexts, it is also utilized for quality control processes involving metals and alloys\cite{huang_progress_2023}.

The ChemCam team uses an analytical system called the \textit{Multivariate Oxide Composition} (MOC) model to predict the composition of major oxides based on LIBS data from geological samples.
The system is comprised of various components, each responsible for a specific task in predicting the composition of major oxides in geological samples.
Expand All @@ -19,15 +25,20 @@ \section{Introduction}\label{sec:introduction}
As part of their preprocessing, they use various techniques to remove noise and outliers from the data.\cite{cleggRecalibrationMarsScience2017}
The model is trained on a calibration dataset consisting of LIBS data from 408 terrestrial rock samples, simulated to mimic Martian conditions\cite{cleggRecalibrationMarsScience2017}.

The interpretation of LIBS data poses significant computational challenges.
First, a high degree of multicollinearity exists within the spectral data, rendering traditional linear analysis methods less effective.
The multicollinearity arises due to the correlation among different spectral channels, influenced both by the multi-line emission characteristics of individual elements and by geochemical correlations between elements.
Secondly, the complexity of LIBS spectra is increased by multiple interacting physical processes because of \textit{matrix effects}. 'Matrix effects' refer to any effect that can cause the intensity of emission lines from an element to vary, independent of that element's concentration. Such variability complicates the direct interpretation of the spectra and poses challenges for computational models aiming for accurate elemental quantification.
It is possible to partially account for these effects by using multivariate algorithms that make use of the information contained in the entire spectrum, rather than individual lines.\cite{andersonImprovedAccuracyQuantitative2017}

The Mars Science Laboratory has made notable progress in planetary exploration, largely relying on models like the Multivariate Oxide Composition (MOC) to interpret Laser-Induced Breakdown Spectroscopy (LIBS) data from Martian geological samples.
Despite its utility, a domain expert from the ChemCam team has observed that the existing MOC model exhibits limitations in both predictive accuracy and robustness.
Enhancing the predictive accuracy and robustness of the MOC model is crucial for achieving more reliable composition predictions, thereby furthering the scientific objectives of the Mars Science Laboratory in understanding Martian geology and potential habitability.
Accuracy, in this context, is measured as Root Mean Squared Error (RMSE).
Robustness refers to the model's ability to handle the variations in the data.
We use a term 'matrix effects' as a catch-all term for any effect that can cause the intensity of emission lines from an element to vary independent of that element's concentration.
The complexity of LIBS spectra is increased by multiple interacting physical processes.
These interactions introduce variability into the emission line intensities independent of the elements' concentrations.
Such variability complicates the direct interpretation of the spectra and poses challenges for computational models aiming for accurate elemental quantification.\cite{andersonImprovedAccuracyQuantitative2017}

The challenges posed by the inherent complexities in interpreting LIBS spectra underscore the need for refinement of models like the MOC.
Despite its current capabilities, the domain expert from the ChemCam team, emphasizes room for improvement in how the model handles data variability and predicts elemental compositions.

\textit{In this work, we aim to solve the problem of identifying issues within the components of the current Multivariate Oxide Composition (MOC) model that limit its predictive accuracy and robustness, particularly in relation to matrix effects. Following this, we will propose improvements to the model's components that addresses these issues, thereby enhancing its overall accuracy and robustness.}

Expand Down
14 changes: 7 additions & 7 deletions report_pre_thesis/src/sections/known_limitations.tex
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
% Matrix Effects: These constitute a challenge in data preprocessing and normalization. In machine learning terms, matrix effects introduce a form of 'class imbalance' or 'data skew.' The model can be misled by the dominant features (in this case, spectral lines influenced by matrix effects) and fail to generalize well. Dealing with this requires sophisticated preprocessing techniques or specialized algorithms capable of handling imbalanced or skewed data.

% MULTICOLINEARITY & MATRIX EFFECTS - PROSE
The interpretation of LIBS data poses significant computational challenges.
First, a high degree of multicollinearity exists within the spectral data, rendering traditional linear analysis methods less effective.
The multicollinearity arises due to the correlation among different spectral channels, influenced both by the multi-line emission characteristics of individual elements and by geochemical correlations between elements\cite{andersonImprovedAccuracyQuantitative2017}.
Secondly, the complexity of LIBS spectra is increased by multiple interacting physical processes - the aforementioned \textit{matrix effects}.
These interactions introduce variability into the emission line intensities independent of the elements' concentrations.
It is possible to partially account for these effects by using multivariate algorithms that make use of the information contained in the entire spectrum, rather than individual lines\cite{andersonImprovedAccuracyQuantitative2017}.
Such variability complicates the direct interpretation of the spectra and poses challenges for computational models aiming for accurate elemental quantification.
%The interpretation of LIBS data poses significant computational challenges.
%First, a high degree of multicollinearity exists within the spectral data, rendering traditional linear analysis methods less effective.
%The multicollinearity arises due to the correlation among different spectral channels, influenced both by the multi-line emission characteristics of individual elements and by geochemical correlations between elements\cite{andersonImprovedAccuracyQuantitative2017}.
%Secondly, the complexity of LIBS spectra is increased by multiple interacting physical processes - the aforementioned \textit{matrix effects}.
%These interactions introduce variability into the emission line intensities independent of the elements' concentrations.
%It is possible to partially account for these effects by using multivariate algorithms that make use of the information contained in the entire spectrum, rather than individual lines\cite{andersonImprovedAccuracyQuantitative2017}.
%Such variability complicates the direct interpretation of the spectra and poses challenges for computational models aiming for accurate elemental quantification.
Loading

0 comments on commit d65cbcf

Please sign in to comment.