Workflow for the comprehensive detection and prioritization of variants in human genomes with PacBio HiFi reads
Note: Workflow is committed. Web app code to come.
- William Rowell (@williamrowell)
- Aaron Wenger (@amwenger)
This repo consists of three Snakemake workflows:
- find new HiFi BAMs or FASTQs under
smrtcells/ready/*/
- align HiFi reads to reference (GRCh38 by default) with pbmm2
- calculate aligned coverage depth with mosdepth
- calculate depth ratios (chrX:chrY, chrX:chr2) from mosdepth summary to check for sample swaps
- calculate depth ratio (chrM:chr2) from mosdepth summary to check for consistency between runs
- count kmers in HiFi reads using jellyfish, dump and export modimers
- launch once sample has been sequenced to sufficient depth
- discover and call structural variants with pbsv
- call small variants with DeepVariant
- phase small variants with WhatsHap
- merge per SMRT Cell BAMs and tag merged bam with haplotype based on WhatsHap phased DeepVariant variant calls
- merge jellyfish kmer counts
- assemble reads with hifiasm and calculate stats with calN50.js
- align assembly to reference with minimap2
- check for sample swaps by calculate consistency of kmers between sequencing runs
- launched once all samples in cohort have been processed
- if multi-sample cohort
- jointly call structural variants with pbsv
- jointly call small variants with GLnexus
- using slivar
- annotate variant calls with population frequency from gnomAD and HPRC variant databases
- filter variant calls according to population frequency and inheritance patterns
- detect possible compound heterozygotes, and filter to remove cis-combinations
- assign a phenotype rank (Phrank) score, based on Jagadeesh KA, et al. 2019. Genet Med.
- some tools (e.g. pbsv) require linux
- conda
- singularity >= 3.5.3 installed by root
environment.yaml
config.yaml
contains file paths and version numbers for docker imagesreference.yaml
contains file paths and names related to reference*.cluster.yaml
contains example cluster configuration for a slurm cluster with acompute
queue for general compute and aml
queue for GPU.