diff --git a/report_thesis/src/sections/background/data_overview.tex b/report_thesis/src/sections/background/data_overview.tex new file mode 100644 index 00000000..1daa3854 --- /dev/null +++ b/report_thesis/src/sections/background/data_overview.tex @@ -0,0 +1,25 @@ +\subsection{Data Overview}\label{sec:data-overview} +Similarly to our previous work (\citet{p9_paper}), we used the publicly available \gls{ccs} data from NASA's \gls{pds}~\cite{PDSGeoscienceNode}. +\gls{ccs} refers to \gls{libs} data that has been through a series of preprocessing steps such as subtracting the ambient light background, noise removal and removing the electron continuum to derive data that is more suitable for quantitative analysis. +A comprehensive description of this preprocessing procedure is available in \citet{wiensPreflightCalibrationInitial2013}. + +\begin{table*}[h] +\centering +\begin{tabular}{llllllll} +\toprule + wave & shot1 & shot2 & $\cdots$ & shot49 & shot50 & median & mean \\ +\midrule +240.81100 & 6.4026649e+15 & 4.0429349e+15 & $\cdots$ & 1.7922483e+15 & 1.7126615e+15 & 1.9892956e+15 & 1.7561699e+15 \\ +240.86501 & 3.8557462e+12 & 2.2923680e+12 & $\cdots$ & 1.1355429e+12 & 8.6930379e+11 & 7.8172542e+11 & 7.2805052e+11 \\ +$\vdots$ & $\vdots$ & $\vdots$ & $\cdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ \\ +905.38062 & 1.8823427e+08 & 58500403. & $\cdots$ & -8449286.6 & 8710775.0 & 4.0513312e+09 & 5.2188327e+09 \\ +905.57349 & 1.9864713e+10 & 1.2956832e+10 & $\cdots$ & 1.9785415e+10 & 7.1994239e+09 & 1.1311150e+10 & 1.2201224e+10 \\ +\bottomrule +\end{tabular} +\caption{Example of CCS data for a single location (from \citet{p9_paper})} +\label{tab:ccs_data_example} +\end{table*} + +While the \gls{ccs} data is in a more suitable form for quantitative analysis, it still requires further preprocessing. This includes handling negative values and noise at the edges of the spectrometers, as we will describe in Section~\ref{sec:data-preparation}. +Additional preprocessing steps will be necessary to further refine the data for subsequent analysis and model training. +Table~\ref{tab:ccs_data_example} shows an example of the \gls{ccs} data for a single location of a sample. This corresponds to shots ($s$) and wavelength ($\lambda$) of the Intensity Tensor \ref{matrix:intensity} for this location. \ No newline at end of file diff --git a/report_thesis/src/sections/background/index.tex b/report_thesis/src/sections/background/index.tex index 2f323259..b1c65a45 100644 --- a/report_thesis/src/sections/background/index.tex +++ b/report_thesis/src/sections/background/index.tex @@ -1,7 +1,9 @@ \section{Background}\label{sec:background} -In this section, we provide an overview of the preprocessing techniques and machine learning models used in our proposed pipeline. -We outline the various normalization techniques and dimensionality reduction methods, followed by the linear models, regularization models, and ensemble learning models used in this work. +In this section, we provide an overview of the data used in this work, preprocessing techniques, and machine learning models used in our proposed pipeline. +We outline the various normalization techniques and dimensionality reduction methods, followed by the ensemble learning, linear models, and regularization models used. +Finally, we outline stacked generalization. +\input{sections/background/data_overview.tex} \input{sections/background/preprocessing/index.tex} \input{sections/background/linear_and_regularization_models/index.tex} \input{sections/background/ensemble_learning_models/index.tex} \ No newline at end of file diff --git a/report_thesis/src/sections/methodology.tex b/report_thesis/src/sections/methodology.tex index ae555529..823c2c11 100644 --- a/report_thesis/src/sections/methodology.tex +++ b/report_thesis/src/sections/methodology.tex @@ -4,29 +4,10 @@ \section{Experimental Design}\label{sec:methodology} We first describe the datasets used, including their preparation and the method of splitting for model training. Next, we outline the preprocessing steps and the model selection process, followed by a detailed explanation of the experimental setup and evaluation metrics. Finally, we discuss our validation testing procedures and the approach taken to ensure unbiased final model evaluations. -\subsection{Data Preparation} -Similarly to our previous work \citet{p9_paper}, we used the publicly available \gls{ccs} data from NASA's \gls{pds}~\cite{PDSGeoscienceNode}. -\gls{ccs} refers to \gls{libs} data that has been through a series of preprocessing steps such as subtracting the ambient light background, noise removal and removing the electron continuum to derive data that is more suitable for quantitative analysis. -A comprehensive description of this preprocessing procedure is available in \citet{wiensPreflightCalibrationInitial2013}. +\subsection{Data Preparation}\label{sec:data-preparation} +The first step in our methodology is to prepare the datasets for model training and evaluation. +As mentioned in Section~\ref{sec:data-overview}, the data used in this study was obtained from NASA's \gls{pds} and consists of \gls{ccs} data and major oxide compositions for various samples. -\begin{table*}[h] -\centering -\begin{tabular}{llllllll} -\toprule - wave & shot1 & shot2 & $\cdots$ & shot49 & shot50 & median & mean \\ -\midrule -240.81100 & 6.4026649e+15 & 4.0429349e+15 & $\cdots$ & 1.7922483e+15 & 1.7126615e+15 & 1.9892956e+15 & 1.7561699e+15 \\ -240.86501 & 3.8557462e+12 & 2.2923680e+12 & $\cdots$ & 1.1355429e+12 & 8.6930379e+11 & 7.8172542e+11 & 7.2805052e+11 \\ -$\vdots$ & $\vdots$ & $\vdots$ & $\cdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ \\ -905.38062 & 1.8823427e+08 & 58500403. & $\cdots$ & -8449286.6 & 8710775.0 & 4.0513312e+09 & 5.2188327e+09 \\ -905.57349 & 1.9864713e+10 & 1.2956832e+10 & $\cdots$ & 1.9785415e+10 & 7.1994239e+09 & 1.1311150e+10 & 1.2201224e+10 \\ -\bottomrule -\end{tabular} -\caption{Example of CCS data for a single location (from \citet{p9_paper})} -\label{tab:ccs_data_example} -\end{table*} - -While the \gls{ccs} data is in a more suitable form for quantitative analysis, it still requires further preprocessing. Table~\ref{tab:ccs_data_example} shows an example of the \gls{ccs} data for a single location of a sample. This corresponds to shots ($s$) and wavelength ($\lambda$) of the Intensity Tensor \ref{matrix:intensity} for this location. The initial five shots from each sample are excluded because they are usually contaminated by dust covering the sample, which is cleared away by the shock waves produced by the laser \cite{cleggRecalibrationMarsScience2017}. The remaining 45 shots from each location are then averaged, yielding a single spectrum $s$ per location $l$ in the Averaged Intensity Tensor\ref{matrix:averaged_intensity}, resulting in a total of five spectra for each sample.