Multiregional sampling analysis
- We used this pipeline (https://github.com/thw17/Mayo_breast_cancer_heterogeneity_assembly) to assemble and clean our data. We used BAM files output from this pipeline for downstream analyses.
- Use Varscan to genotype somatic mutations
- Snakefile:
MAYO_breastcancer.Snakefile
- HLA typing using
HLA-LA
(https://github.com/DiltheyLab/HLA-LA) - Because
HLA-LA
requires the reads to be mapped to the 1000 Genome version of GRCh38, the first step is to strip the reads from the BAM files usingXYalign
and map the reads to the 1000 Genome version of GRCh38.- Snakefile:
strip_rempa.snakefile
- Config file:
strip_remap_config.json
- Snakefile:
- Run
HLA-LA
:- Snakefile:
hla.snakefile
- Config file:
hla_config.json
- Snakefile:
- After HLA typing, rename the hla directory by running the Python script
rename_hla_directory.py
- Use the program
pvacseq
to generate peptides (21 mers) - Snakefile:
generate_peptides.snakefile
- Use the Python script
generate_config.py
to generate the config filerun_mhcpan_config.json
- Snakefile:
run_mhcpan.snakefile
- Use the Python script
generate_hla_list.py
to generate a list of hla for each sample - Use the Bash script
filter_peptides.sh
for filtering peptides
- Plot the number of somatic mutations per sample:
- Use the Rscript
plot_num_variants_per_individual.R
(under04_mutational_landscape/tabulate_num_variants
)
- Plot upset plot for all of the somatic mutations
- Scripts are under
all_mutations_overlap
- Use the Python script
make_database_for_upsetr
- Use the Rscript
plot_upset.R
- Plot upset plot for fixed mutations
- Scripts are under
fixed_mutations_overlap
- Use Python script
find_fixed_variants.py
(Snakefilefixed_variants_analysis.snakefile
) to find fixed variants. - Use Python script
make_database_for_upsetr_fixed_variants.py
for generating the data for upset plot - Use the Rscript
plot_upset_fixed_mutations.R
for plotting.