-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
127 lines (110 loc) · 6.83 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
--------------------
NeoProMo README File
--------------------
Creator: Elizabeth Borden
Contact: [email protected]
Last Updated: 7/24/2020
------------
Dependencies
------------
VarScan.v2.3.9 https://sourceforge.net/projects/varscan/files/
fpfilter.pl https://github.com/ckandoth/variant-filter/blob/master/fpfilter.pl
gatk-4.1.7.0 https://gatk.broadinstitute.org/hc/en-us/sections/360008763551-4-1-7-0
strelka-2.9.2 https://github.com/Illumina/strelka/releases
FP filter from GATK https://github.com/yuxiangtan/FPfilter
arriba_v1.2.0 https://github.com/suhrig/arriba/releases
blast/2.10.1 ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
---------------------
SNV Calling - Varscan
---------------------
Program name: varscan-mutation-calling.py
Varscan version: VarScan.v2.3.9
Varscan Citation: Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568576 (2012) https://genome.cshlp.org/content/22/3/568.full.pdf
Filters employed: Perl FP filter
--------------------------
SNV Calling - GATK Mutect2
--------------------------
Program name: gatk-mutation-calling.py
GATK version: gatk-4.1.7.0
GATK citation: Benjamin, D. et al. Calling Somatic SNVs and Indels with Mutect2. doi:10.1101/861054. https://www.biorxiv.org/content/10.1101/861054v1
Filters employed: FilterMutectCall
Errors encountered:
1) Program requires a header in a bam file, can be done with steps detailed here: https://gatk.broadinstitute.org/hc/en-us/articles/360037226472-AddOrReplaceReadGroups-Picard
-------------------
SNV Calling Strelka
-------------------
Program name: strelka-mutation-calling.py
Strelka version: strelka-2.9.2
Strelka citation: Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591594 (2018). https://www.nature.com/articles/s41592-018-0051-x
Filters employed: GATK false positive filter
GATK FP Citation: Tan, Yuxiang, Yu Zhang, Hengwen Yang, and Zhinan Yin. n.d. FPfilter: A False-Positive-Specific Filter for Whole-Genome Sequencing Variant Calling from GATK. https://doi.org/10.1101/2020.03.23.003525.https://www.biorxiv.org/content/10.1101/2020.03.23.003525v1.full.pdf
Errors encountered:
1) Manta not currently working - less important for SNVs, more important for indels, in the process of fixing this.
---------------
Getting Overlap
---------------
Program name: compare_mutations.py
Inputs: Peptide files for Gatk, Varscan and Strelka
Ouputs: Data for Venn diagrams and Lists of overlaps
To do:
1) Integrate into a snakemake program
2) Add the following processing steps into a pipeline:
Prep for Comparison program
cat YM-1_VarScan_vep.21.peptide | awk 'NR%2{printf "%s ",$0;next;}1' | sed '/>WT/d' > YM-1_VarScan_vep.21.collapsed
Prep for NetMHCpan
cat YM-1_VarScan_vep.21.peptide |sed 's/._ENSMUST0000/ /' > 21_peptides_forNETMHC
grep -A 1 '>M' 21_peptides_forNETMHC | sed 's/--//' > 21_peptides_forNETMHC_MT
---------------------
Gene Fusions - Arriba
---------------------
Program: arriba-fusions.py
Arriba version: arriba_v1.2.0
Arriba citation: Uhrig, S., Fröhlich, M., Hutter, B. & Brors, B. PO-400 Arriba fast and accurate gene fusion detection from rna-seq data. Epigenetic Mechanisms (2018) doi:10.1136/esmoopen-2018-eacr25.427.https://esmoopen.bmj.com/content/esmoopen/3/Suppl_2/A179.2.full.pdf
Filters employed: Filtered for high and medium confidence in snakemake
---------------------
MHC Binding NetCTLpan
---------------------
Program: netCTLpan-snakemake.py
NetCTLpan version: netCTLpan-1.1
NetCTLpan citation: NetCTLpan - Pan-specific MHC class I epitope predictions Stranzl T., Larsen M. V., Lundegaard C., Nielsen M. Immunogenetics. 2010 Apr 9. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2875469/
Filters employed: filter_MHCrank.py written by Elizabeth Borden - no longer in use
Alternatives: netMHCpan-snakemake.py
MetMHCpan version: netMHCpan-4.0
NetMHCpan citation: NetMHCpan-4.0: Improved PeptideMHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data Vanessa Jurtz, Sinu Paul, Massimo Andreatta, Paolo Marcatili, Bjoern Peters and Morten Nielsen The Journal of Immunology (2017) ji1700893; DOI: 10.4049/jimmunol.1700893 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5679736/
---------------------
MHC Binding Stability
---------------------
Program: netMHCstabpan-snakemake.py
NetMHCstabpan version: NetMHCstabpan-1.0
NetMHCstabpan citation: Pan-specific prediction of peptide-MHC-I complex stability; a correlate of T cell immunogenicity Michael Rasmussen, Emilio Fenoy, Mikkel Harndahl, Anne Bregnballe Kristensen, Ida Kallehauge Nielsen, Morten Nielsen, Soren Buus. Accepted for publication Journal of Immunology, June 2016 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4976001/
------------------------------
Sequence Similarity with Blast
------------------------------
Program: blast_netCTL.py (runs sequence_similarity.py)
Based off of sequence similarity code from Wood et al.
Citation: Wood MA, Paralkar M, Paralkar MP, Nguyen A, Struck AJ, Ellrott K, Margolin A, Nellore A, Thompson RF. Population-level distribution and putative immunogenicity of cancer neoepitopes. BMC Cancer. 2018;18(1):414. doi:10.1186/s12885-018-4325-6
Blast version: blast/2.10.1
Blast citation: Camacho C, Coulouris G, Avagyan V, et al. BLAST+: Architecture and applications. BMC Bioinformatics. 2009;10(1):421. doi:10.1186/1471-2105-10-421
Database: Ensembl's set of all GRCh38 peptides (ftp://ftp.ensembl.org/pub/release-88/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz)
Input file: Peptides with peptides as labels
---------------
Prep for Luksza
---------------
Program: Penalized_Regression_Fit.R
Part of this program preps files for Luksza program, another part prepares it for the penalized regression model.
To do:
Integrate and make run-able from the command line environment.
------------------------------
Luksza TCR Recognition Program
------------------------------
Program: luksza-snakemake.py
Runs:
main.py
Aligner.py
Neoantigen.py
All adapted and updated from Luksza et al.
Changes:
- Updated a few functions that were not compatable with a newer version of python
- Updated hydrophobicity to have functions that can either do the original binding residue hydrophobicity or the fraction hydrophobicity calculated using Luksza's definition of hydrophobic residues or the fraction hydrophobicity calculated using the consortium's definition of hydrophobic residues. (https://doi.org/10.1016/j.cell.2020.09.015)
Citation: Łuksza M, Riaz N, Makarov V, Balachandran VP, Hellmann MD, Solovyov A, Rizvi NA, Merghoub T, Levine AJ, Chan TA, et al. A neoantigen fitness model predicts tumor response to checkpoint blockade immunotherapy. Nature. 2017;551(7681):517-520. doi:10.1038/nature24473
Requires the Luksza program obtained from here: https://www.nature.com/articles/nature24473