MoGCN, a multi-omics integration method based on graph convolutional network.
As shown in figure, inputs to the model are multi-omics expression matrices, including but not limited to genomics, transcriptomics, proteomics, etc. MoGCN exploits the GCN model to incorporate and extend two unsupervised multi-omics integration algorithms: Autoencoder algorithm (AE) based on expression matrix and similarity network fusion algorithm based on patient similarity network. Feature extraction is not necessary before AE and SNF.
MoGCN is a Python scirpt tool, Python environment need:
Python 3.6 or above
Pytorch 1.4.0 or above
snfpy 0.2.2
python setup.py install
The whole workflow is divided into three steps:
- Use AE to reduce the dimensionality of multi-omics data to obtain multi-omics feature matrix
- Use SNF to construct patient similarity network
- Input multi-omics feature matrix and the patient similarity network to GCN
The sample data is in the data folder, which contains the CNV, mRNA and RPPA data of BRCA.
python AE_run.py -p1 data/fpkm_data.csv -p2 data/gistic_data.csv -p3 data/rppa_data.csv -m 0 -s 0 -d cpu
python SNF.py -p data/fpkm_data.csv data/gistic_data.csv data/rppa_data.csv -m sqeuclidean
python GCN_run.py -fd result/latent_data.csv -ad result/SNF_fused_matrix.csv -ld data/sample_classes.csv -ts data/test_sample.csv -m 1 -d gpu -p 20
The meaning of the parameters can be viewed through -h/--help
- The input type of each omics data must be .csv, the rows represent samples, and the columns represent features (genes). In each expression matrix, the first column must be the samples, and the remaining columns are features. Samples in all omics data must be consistent. AE and SNF are unsupervised models and do not require sample labels.
- GCN is a semi-supervised classification model, it requires sample label files (.csv format) during training. The first column of the label file is the sample name, the second column is the digitized sample label, the remaining columns are not necessary.
For any questions please contact Dr. Xiao Li (Email: [email protected]).
MIT License
Li X, Ma J, Leng L, Han M, Li M, He F and Zhu Y (2022) MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis. Front. Genet. 13:806842. doi: 10.3389/fgene.2022.806842.