Spring 2024 - PHYS 2550 - Final Project
Authors: Feifan Deng, Jade Ducharme, Zacharias Escalante, Soren Helhoski, Shi Yan
The goal of our final project is to perform a linear regression task on weakly-lensed galaxies located behind nearby massive clusters in order to infer, based on seven distinct features, the redshift of each.
We include four files in this folder.
specz_fluxes.csv
: the "raw" data which contains some missing ("NaN") values.clean_specz_fluxes.csv
: the "clean" data, where the missing ("NaN") values have been dealt with. This is our training data.synthetic_data.csv
: Synthetic data, generated using the known errors on our flux measurements.specz_photoz.csv
: the non_ML comparison data. Contains "true" (spectroscopic) redshifts and the associated non-ML (photoz) redshift.
In the src
folder, you will find three accompanying Python files:
model.py
: where all model classes and training loops are defined.preprocess.py
: where all data preprocessing functions (data loading, standardization, and splitting) are defined.visualize.py
: where all plotting functions (loss curves, histograms, predictions vs. labels, etc.) are defined.
For a more comfortable user experience, we include two notebooks which continuously refer to the source code from the src
folder, reducing code bloat in the notebooks themselves.
This notebook details how to obtain the "clean" data from the "raw" data/specz_fluxes.csv
. Following a suggestion made by Prof. Gouskos after our final presentation, this notebook can now also be used to generate synthetic data using the errors on the flux measurements.
Here, all models are instantiated and trained, and all training and prediction curves are presented. The models we considered are:
- Fully-Connected Neural Network (FCNN)
- *FCNN on Synthetic Data
- 1D Convolutional Neural Network
- Graph Attention Network
- k-Nearest-Neighbors Regression
*Based on a suggestion from Prof. Gouskos after our final presentation, we generated some synthetic data and used it as training input to an FCNN.
We also include our final presentation slides under presentation_slides.pdf
:)