Team 5 DLH Final Project

This repository is for the reproduced implementation of scGCL: an imputation method for scRNA-seq data based on graph contrastive learning .

The paper proposes a method known as scGCL, which stands for single-cell Graph Contrastive Learning. By employing contrastive learning within a graph theory framework, scGCL is adept at capturing both the global and local semantic information between the cells, enhancing both the prediction and reconstruction of missing gene expression values. Additionally, it utilizes an autoencoder framework based on the ZINB distribution, which is designed to model the global probability distribution of gene expression data. This approach provides a solution to measure dropout more accurately. scGCL not only diminishes noise in the data but also enhances the precision of clustering and classification efforts. It increases the accuracy of differential expression analyses, and aids in the integration of data from various sources, therefore simplifying the overall complexity of handling these scRNA-seq datasets. For our reproduction of the paper, we chose to focus on one hypothesis which is that we hypothesize that scGCL will exhibit better clustering performance and imputation performance than other existing methods.

System Requirements

Python: 3.10
numpy: 1.25.2
pandas: 2.0.3
sklearn: 1.2.2
h5py 3.9.0
scipy: 1.11.4
torch: 2.2.1+cu121
faiss 1.8.0
argparse: 1.1
anndata: 0.10.7
scanpy: 1.10.1
torch_geometric: 2.5.2
tensorboardX: 2.6.2.2
matplotlib: 3.7.1

Hyperparameters

Learning Rate: 0.001
k-value: 15
Distance measurement: cosine

Arguments

Parameter	Introduction
dataset	A h5 file. Contains a matrix of scRNA-seq expression values,true labels, and other information.
task	Downstream task. Supported tasks are: node, clustering, similarity
es	Early Stopping Criterion
epochs	Number of training epochs

Computational Requirements

Number of epochs: 300
Average run time of one epoch: ~24 seconds per epoch
Total run time for 300 epochs: 1 hour 59 minutes 46 seconds
Type of hardware used: GPU (on google colab)

Code Execution

To execute the code, run all cells in Google Colab here.

Data and Pre-Trained Model

Although the original paper compared 14 different datasets, we focused our efforts on dataset Adam. The Adam dataset is stored in data.tgz and our pre-trained model is stored in model_checkpoints.tgz. Both are downloaded by executing the code in section (2) Data.

Training

For efficiency purposes, we pre-trained the model and stored the checkpoint to use in evaluation. Executing the code in section (5) Training will train the model for 1 epoch for demonstration purposes.

Evaluation

Evaluation is done on the pre-trained model by executing the code in section (6) Evaluation.

Results

Our model achieves the following performance on the Adam dataset:

NMI	ARI
0.84	0.83

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
data		data
images		images
model_checkpoints		model_checkpoints
results		results
.DS_Store		.DS_Store
DLH Team 5 Project Final Submission		DLH Team 5 Project Final Submission
DLH_Team_5_Project_Final_Submission.ipynb		DLH_Team_5_Project_Final_Submission.ipynb
README.md		README.md
Team 5 DLH Final Presentation.mp4		Team 5 DLH Final Presentation.mp4
Team 5 Project Proposal.docx		Team 5 Project Proposal.docx
data.tgz		data.tgz
dlh_team_5_project_final_submission.py		dlh_team_5_project_final_submission.py
model_checkpoints.tgz		model_checkpoints.tgz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Team 5 DLH Final Project

System Requirements

Hyperparameters

Arguments

Computational Requirements

Code Execution

Data and Pre-Trained Model

Training

Evaluation

Results

About

Releases

Packages

Contributors 2

Languages

joe-jhou2/CS598DLH-FinalProject

Folders and files

Latest commit

History

Repository files navigation

Team 5 DLH Final Project

System Requirements

Hyperparameters

Arguments

Computational Requirements

Code Execution

Data and Pre-Trained Model

Training

Evaluation

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages