Single Sample SVs

Introduction

Tutorials

Command Line

Sample commands:

run_cnvkit -> cnvkit_main (docker: etal/cnvkit) /usr/bin/python /usr/local/bin/cnvkit.py batch -r <cnvkit_reference_cnn> --method wgs

run_cnvkit -> cns_to_vcf (docker: etal/cnvkit) /usr/bin/python /usr/local/bin/cnvkit.py call <cnvkit_main output cns> -o adjusted.tumor.cns && /usr/bin/python /usr/local/bin/cnvkit.py export vcf adjusted.tumor.cns --cnr <cnvkit_main output cnr> -o cnvkit.vcf

run_manta (docker: mgibio/manta_somatic-cwl) /usr/bin/python /usr/bin/manta/bin/configManta.py --referenceFasta --tumorBam --runDir && /usr/bin/python runWorkflow.py -m local -j 12

run_smoove (docker: brentp/smoove) /usr/local/bin/smoove call --processes 4 -F --genotype --name SV --fasta --exclude <smoove_exclude_regions>

CWL Workflow

INSERT LINK TO DOCKER IMAGES/REPOS AND CWL

Steps

INSERT PROCESS DIAGRAM

Inputs

Name	Description	Example	Required
bam	Aligned sequencing results to be analyzed for SVs		✓
cnvkit_diagram	Create an ideogram of copy ratios on chromosomes as a pdf	false
cnvkit_drop_low_coverage	Helps avoid false positive deletions in low quality tumor samples	false
cnvkit_male_reference	Use/assume a male reference	false
cnvkit_method	Sequencing protocol used	wgs
cnvkit_reference_cnn	A copy number reference file against which potential copy number variants will be evaluated	/gscmnt/gc2560/core/cnvkit_pon/v1/reference.cnn	✓
cnvkit_scatter_plot	Create a whole genome copy ratio profile as a pdf scatter plot	false
cnvkit_vcf_name	Custom name to use for the cnvkit output vcf	cnvkit_output
manta_call_regions	bgzip-compressed, tabix-indexed BED file specifiying regions to which variant analysis will be restricted
manta_non_wgs	When true, activates settings appropriate for whole exome sequencing	false
manta_output_contigs	if true, outputs assembled contig sequences in final VCF files, in the INFO field CONTIG	true
maximum_sv_pop_freq	Population frequency above which variants will be filtered out
merge_estimate_sv_distance	When evaluating variants to be merged, estimate distance based on the size of the sv	true	✓
merge_max_distance	Maximum distance of variants to consider for merging	1000	✓
merge_min_sv_size	Minimum size of SVs to merge	1	✓
merge_min_svs	Minimum number of sv calls needed to be merged	1	✓
merge_same_strand	Require merged SVs to be on the same strand	true	✓
merge_same_type	Require merged SVs to be of the same type	true	✓
merge_sv_pop_freq_db	bed file containing allele frequencies for a population	/gscmnt/gc2560/core/cwl/inputs/hall_lab_B38_SV_public_callset/sv.bedpe.gz	✓
reference	Reference sequence	example_data/exome_workflow/chr17_test.fa	✓
smoove_exclude_regions	Regions to be ignored when calling SVs through smoove (a wrapper for lumpy)
sv_filter_interval_lists	One or more interval lists defining regions to keep in the output vcf, labeled with the source of the intervals	/gscmnt/gc2560/core/model_data/interval-list/db8c25932fd94d2a8a073a2e20449878/a35b64d628b94df194040032d53b5616.interval_list, /gscmnt/gc2560/core/model_data/interval-list/1eea27120d294db49826cef2e79b618c/3a61ffd42f074fe1b8a20742f6dfb32e.interval_list, /gscmnt/gc2560/core/model_data/interval-list/86494a288c3c4d7a89842ed2f1d6e36a/f54639200d364231bd5e1c39266ccfac.interval_list	✓
variants_to_table_fields	one or more of any standard VCF column (CHROM, ID, QUAL) or any binding in the INFO field (e.g., AC=10) to add to the tsv report
variants_to_table_genotype_fields	one or more of any binding in the FORMAT field (e.g., GQ, PL) to add to the tsv report
vep_cache_dir	Location of a local ensembl cache to be used by vep	example_data/exome_workflow/	✓
vep_ensembl_assembly	Which (species) assembly vep should use	GRCh38	✓
vep_ensembl_species	Which species vep should use	homo_sapiens	✓
vep_ensembl_version	Which ensembl release vep should use	95	✓
vep_to_table_fields	VEP CSQ annotation fields to add to the tsv report

Outputs

Name	Source	Description
annotated_tsvs	GATK VariantsToTable	tsv files containing specified SV fields and annotations
cn_diagram	CNVkit	ideogram of copy ratios on chromosomes
cn_scatter_plot	CNVkit	whole genome copy ratio profile
cnvkit_vcf	CNVkit	final cnvkit output, converted to vcf format
filtered_vcfs	Various filters	SV VCF, filtered by variant population frequency and the above interval lists
manta_all_candidates	Manta	Unscored SV and indel candidates
manta_diploid_variants	Manta	SVs and indels scored and genotyped under a diploid model
manta_small_candidates	Manta	simple insertion and deletion variants less than the minimum scored variant size (50 by default)
manta_somatic_variants	Manta	SVs and indels scored under a somatic variant model
manta_tumor_only_variants	Manta	Subset of the candidateSV.vcf.gz file after removing redundant candidates and small indels less than the minimum scored variant size (50 by default)
merged_annotated_svs	Suvivor, VEP	SV calls from Manta, CNVkit, and Smoove(lumpy), merged by Survivor and annotated by VEP
smoove_output_variants	Smoove (Lumpy)	SV calls from Smoove, a wrapper for Lumpy
sv_pop_filtered_vcf	Various filters	SV VCF, filtered by variant population frequency
tumor_antitarget_coverage	CNVkit	Coverage in the antitarget regions from bam read depths
tumor_bin_level_ratios	CNVkit	table of copy number ratios
tumor_segmented_ratios	CNVkit	discrete copy number segments from the above table
tumor_target_coverage	CNVkit	Coverage in the target regions from bam read depths

Want to contribute to this Wiki?

Fork it and send a pull request.

Return to Wiki Home
Return to analysis-workflows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly