CPTAC3-RNA-related-pipeline

####################

Transcript

####################

Reference

Genome sequence and annotation are downloaded from Ensembl (GRCh37.75)

Tools

MapSplice is downloaded from http://www.netlab.uky.edu/p/bioinfo/MapSplice2

Cufflinks is downloaded from http://cole-trapnell-lab.github.io/cufflinks/install/

Processing

first run

perl get_link.pl /gscuser/mwyczalk/projects/CPTAC3/import.CPTAC3b1/BamMap/CPTAC3.b1.RNA-Seq.BamMap.dat

to generate 'to_run.sh' file

then run bash to_run.sh , which 1) maps fastq file to human genome using MapSplice; 2) processes bam file from MapSplice to estimate transcript expression using Cufflinks; 3) convert GTF file from Cufflinks to BED12 using custom Perl script Convert_GTF_To_Bed12.pl

Output

Output files can be found in 'TRANSCRIPT_BED/' of each sample folder, reported in standard BED12 format with gene expression incorported in the 4th column (TranscriptID|FPKM)

Detailed about BED12 format can be found in https://genome.ucsc.edu/FAQ/FAQformat.html.

####################

Fusion

####################

#Three tools were used for fusion calling

#Their references (GRCh19) were also downloaded from the corresponding database

STAR-Fusion is downloaded from https://github.com/STAR-Fusion/STAR-Fusion/wiki

EricScript is downloaded from https://sites.google.com/site/bioericscript

Integrate is downloaded from https://sourceforge.net/p/integrate-fusion/wiki/Home/

Fusion calling

first run

perl get_link.pl /gscuser/mwyczalk/projects/CPTAC3/import.CPTAC3b1/BamMap/CPTAC3.b1.RNA-Seq.BamMap.dat

to generate 'to_run.sh' file

then run 'bash to_run.sh', which runs these three tools individually for each sample

finally run 'perl combine_call.pl DIR (/gscmnt/gc2521/dinglab/qgao/RNA/Batch_20171110)' to merge all the raw fusion calls into one file 'Total_Fusions.tsv'

Fusion filtering

Since raw fusion calls contain many false positives, extensive filtering was performed.

The basic idea for filtering is:

first get the fusions 1) reported by at least 2 callers, or 2) reported by STAR-Fusion (showing higher sensitivity) but with higher supporting evidence (defined by fusion fragments per million total reads, or FFPM, >0.1)

then remove the fusions in the filtering database 'FilterDatabase', including:

uncharacterized genes, immunoglobin genes, mitochondrial genes, etc.
fusions from the same gene or paralogue genes (downloaded from https://www.genenames.org/cgi-bin/statistics)
fusions reported in normal samples - TCGA normals (from pancan fusion analysis, under review), GTEx tissues (reported in star-fusion output), non-cancer cell study (PMID: 26837576)

In general, simply run 'perl filter.pl' to generate the filtered fusion calls 'Filtered_Fusions.tsv'

Optionally, run 'bash generate_fusion_per_sample.sh' to split the filtered fusion calls into one file per sample.

Output

In the output file, each row represents one fusion. There are 9 columns for each fusion:

FusionName
LeftBreakpoint
RightBreakpoint
Cancer__Sample
JunctionReadCount
SpanningFragCount
FFPM - fusion fragments per million total reads, 'NA' means the fusion was found by both EricScript and Integrate but not STAR-Fusion
PROT_FUSION_TYPE - INFRAME, FRAMESHIFT or '.'
CallerN - number of callers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CPTAC3-RNA-related-pipeline

Transcript

Reference

Tools

Processing

Output

Fusion

Fusion calling

Fusion filtering

Output

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
FilterDatabase		FilterDatabase
Convert_GTF_To_Bed12.pl		Convert_GTF_To_Bed12.pl
README.md		README.md
combine_call.pl		combine_call.pl
filter.pl		filter.pl
generate_fusion_per_sample.sh		generate_fusion_per_sample.sh
get_link.pl		get_link.pl
rna_pipeline.sh		rna_pipeline.sh
to_run.sh		to_run.sh
transcript_pipeline.sh		transcript_pipeline.sh

ding-lab/CPTAC3-RNA-related-pipeline

Folders and files

Latest commit

History

Repository files navigation

CPTAC3-RNA-related-pipeline

Transcript

Reference

Tools

Processing

Output

Fusion

Fusion calling

Fusion filtering

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages