Skip to content

Commit

Permalink
Merge pull request #196 from chhoumann/kb-229-unify-ccs
Browse files Browse the repository at this point in the history
[KB-229] Moved description of ccs data to background
  • Loading branch information
Ivikhostrup authored Jun 7, 2024
2 parents 5096a17 + caf1073 commit 4890183
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 24 deletions.
25 changes: 25 additions & 0 deletions report_thesis/src/sections/background/data_overview.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
\subsection{Data Overview}\label{sec:data-overview}
Similarly to our previous work (\citet{p9_paper}), we used the publicly available \gls{ccs} data from NASA's \gls{pds}~\cite{PDSGeoscienceNode}.
\gls{ccs} refers to \gls{libs} data that has been through a series of preprocessing steps such as subtracting the ambient light background, noise removal and removing the electron continuum to derive data that is more suitable for quantitative analysis.
A comprehensive description of this preprocessing procedure is available in \citet{wiensPreflightCalibrationInitial2013}.

\begin{table*}[h]
\centering
\begin{tabular}{llllllll}
\toprule
wave & shot1 & shot2 & $\cdots$ & shot49 & shot50 & median & mean \\
\midrule
240.81100 & 6.4026649e+15 & 4.0429349e+15 & $\cdots$ & 1.7922483e+15 & 1.7126615e+15 & 1.9892956e+15 & 1.7561699e+15 \\
240.86501 & 3.8557462e+12 & 2.2923680e+12 & $\cdots$ & 1.1355429e+12 & 8.6930379e+11 & 7.8172542e+11 & 7.2805052e+11 \\
$\vdots$ & $\vdots$ & $\vdots$ & $\cdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ \\
905.38062 & 1.8823427e+08 & 58500403. & $\cdots$ & -8449286.6 & 8710775.0 & 4.0513312e+09 & 5.2188327e+09 \\
905.57349 & 1.9864713e+10 & 1.2956832e+10 & $\cdots$ & 1.9785415e+10 & 7.1994239e+09 & 1.1311150e+10 & 1.2201224e+10 \\
\bottomrule
\end{tabular}
\caption{Example of CCS data for a single location (from \citet{p9_paper})}
\label{tab:ccs_data_example}
\end{table*}

While the \gls{ccs} data is in a more suitable form for quantitative analysis, it still requires further preprocessing. This includes handling negative values and noise at the edges of the spectrometers, as we will describe in Section~\ref{sec:data-preparation}.
Additional preprocessing steps will be necessary to further refine the data for subsequent analysis and model training.
Table~\ref{tab:ccs_data_example} shows an example of the \gls{ccs} data for a single location of a sample. This corresponds to shots ($s$) and wavelength ($\lambda$) of the Intensity Tensor \ref{matrix:intensity} for this location.
6 changes: 4 additions & 2 deletions report_thesis/src/sections/background/index.tex
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
\section{Background}\label{sec:background}
In this section, we provide an overview of the preprocessing techniques and machine learning models used in our proposed pipeline.
We outline the various normalization techniques and dimensionality reduction methods, followed by the linear models, regularization models, and ensemble learning models used in this work.
In this section, we provide an overview of the data used in this work, preprocessing techniques, and machine learning models used in our proposed pipeline.
We outline the various normalization techniques and dimensionality reduction methods, followed by the ensemble learning, linear models, and regularization models used.
Finally, we outline stacked generalization.

\input{sections/background/data_overview.tex}
\input{sections/background/preprocessing/index.tex}
\input{sections/background/linear_and_regularization_models/index.tex}
\input{sections/background/ensemble_learning_models/index.tex}
25 changes: 3 additions & 22 deletions report_thesis/src/sections/methodology.tex
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,10 @@ \section{Experimental Design}\label{sec:methodology}
We first describe the datasets used, including their preparation and the method of splitting for model training. Next, we outline the preprocessing steps and the model selection process, followed by a detailed explanation of the experimental setup and evaluation metrics. Finally, we discuss our validation testing procedures and the approach taken to ensure unbiased final model evaluations.


\subsection{Data Preparation}
Similarly to our previous work \citet{p9_paper}, we used the publicly available \gls{ccs} data from NASA's \gls{pds}~\cite{PDSGeoscienceNode}.
\gls{ccs} refers to \gls{libs} data that has been through a series of preprocessing steps such as subtracting the ambient light background, noise removal and removing the electron continuum to derive data that is more suitable for quantitative analysis.
A comprehensive description of this preprocessing procedure is available in \citet{wiensPreflightCalibrationInitial2013}.
\subsection{Data Preparation}\label{sec:data-preparation}
The first step in our methodology is to prepare the datasets for model training and evaluation.
As mentioned in Section~\ref{sec:data-overview}, the data used in this study was obtained from NASA's \gls{pds} and consists of \gls{ccs} data and major oxide compositions for various samples.

\begin{table*}[h]
\centering
\begin{tabular}{llllllll}
\toprule
wave & shot1 & shot2 & $\cdots$ & shot49 & shot50 & median & mean \\
\midrule
240.81100 & 6.4026649e+15 & 4.0429349e+15 & $\cdots$ & 1.7922483e+15 & 1.7126615e+15 & 1.9892956e+15 & 1.7561699e+15 \\
240.86501 & 3.8557462e+12 & 2.2923680e+12 & $\cdots$ & 1.1355429e+12 & 8.6930379e+11 & 7.8172542e+11 & 7.2805052e+11 \\
$\vdots$ & $\vdots$ & $\vdots$ & $\cdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ \\
905.38062 & 1.8823427e+08 & 58500403. & $\cdots$ & -8449286.6 & 8710775.0 & 4.0513312e+09 & 5.2188327e+09 \\
905.57349 & 1.9864713e+10 & 1.2956832e+10 & $\cdots$ & 1.9785415e+10 & 7.1994239e+09 & 1.1311150e+10 & 1.2201224e+10 \\
\bottomrule
\end{tabular}
\caption{Example of CCS data for a single location (from \citet{p9_paper})}
\label{tab:ccs_data_example}
\end{table*}

While the \gls{ccs} data is in a more suitable form for quantitative analysis, it still requires further preprocessing. Table~\ref{tab:ccs_data_example} shows an example of the \gls{ccs} data for a single location of a sample. This corresponds to shots ($s$) and wavelength ($\lambda$) of the Intensity Tensor \ref{matrix:intensity} for this location.
The initial five shots from each sample are excluded because they are usually contaminated by dust covering the sample, which is cleared away by the shock waves produced by the laser \cite{cleggRecalibrationMarsScience2017}.
The remaining 45 shots from each location are then averaged, yielding a single spectrum $s$ per location $l$ in the Averaged Intensity Tensor\ref{matrix:averaged_intensity}, resulting in a total of five spectra for each sample.

Expand Down

0 comments on commit 4890183

Please sign in to comment.