CNAPE

Copy Number Alteration Prediction from gene Expression in human cancers

Ownership

Status

Active development

Introduction

Copy number alterations (CNAs) are important features of human cancer. While the standard methods for CNA detection (CGH arrays, SNP arrrays, DNA sequencing) rely on DNA, occasionally DNA data are not available, especially in cancer studies (e.g. biopsies, legacy data). CNAPE comes into play by predicting CNAs based on gene expression data from RNA-seq.

How to run

1. Installation

Before installing CNAPE please make sure you have installed R, and Rscript is available in your system path ($PATH).

A simple clone of the repository is enough for installation, since the necessary packages will be installed automatically when you run CNAPE.

git clone https://github.com/WangLabHKUST/CNAPE

2. Preparing the input files

CNAPE.R takes the gene expression matrix of the human cancer samples as input. For RNA-seq data, you can process them using TCGA's RNA-seq processing pipeline (i.e., reads were aligned to the human genome using MapSplice and expression was quantified/normalized using RSEM against UCSC genes).

An example input file demonstrating the format of the input gene expression matrix can be found in the example/ folder.

3. Running CNAPE

The main function of CNAPE is packaged in cnape.R. Get your gene expression profile prepared, and run it like this:

Rscript cnape.R expressionMatrix outputPrefix

The output contains prefix.chromosome_level.cna.txt and prefix.arm_level.cna.txt, where 1 means amplified, -1 means deleted, while 0 means no CNA change.

4. Examples

Large-scale CNAs

For chromosome and arm level CNAs, the models trained on TCGA pan-cancer data are available. After you have cloned CNAPE, please go to the CNAPE folder and run :

./run_example.sh

Your result files, named example.chromosome_level.cna.txt and example.arm_level.cna.txt, should appear in the example folder. You can compare the results with the provided example.chromosome_level.cna.origional.txt and example.arm_level.cna.origional.txt.

Gene-level CNAs

A more detailed example on gene-level CNA prediction is provided, using the open-access TCGA pan-glioma data. In this example you will see how the models are formulated and trained, as well as their performance in testing. We also show how you can extract the feature genes in the models.

Dependencies

The models are trained on the TCGA Pancancer Atlas data, using glmnet package in R. The dependency requirements are automatically solved while running the program.

Contact

For technical issues please send an email to [email protected] or [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
example		example
model		model
scripts		scripts
.DS_Store		.DS_Store
README.md		README.md
cnape.R		cnape.R
run_example.sh		run_example.sh
tmp.txt		tmp.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CNAPE

Ownership

Status

Introduction

How to run

1. Installation

2. Preparing the input files

3. Running CNAPE

4. Examples

Large-scale CNAs

Gene-level CNAs

Dependencies

Contact

About

Releases

Packages

Languages

WangLabHKUST/CNAPE

Folders and files

Latest commit

History

Repository files navigation

CNAPE

Ownership

Status

Introduction

How to run

1. Installation

2. Preparing the input files

3. Running CNAPE

4. Examples

Large-scale CNAs

Gene-level CNAs

Dependencies

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages