Skip to content

Workflow for the comprehensive detection and prioritization of variants in human genomes with PacBio HiFi reads

License

Notifications You must be signed in to change notification settings

amwenger/pbRUGD-workflow

Repository files navigation

pbRUGD-workflow

Workflow for the comprehensive detection and prioritization of variants in human genomes with PacBio HiFi reads

Note: Workflow is committed. Web app code to come.

Authors

Description

This repo consists of three Snakemake workflows:

  1. process_smrtcells
  2. process_samples
  3. process_cohorts

process_smrtcells

  • find new HiFi BAMs or FASTQs under smrtcells/ready/*/
  • align HiFi reads to reference (GRCh38 by default) with pbmm2
  • calculate aligned coverage depth with mosdepth
  • calculate depth ratios (chrX:chrY, chrX:chr2) from mosdepth summary to check for sample swaps
  • calculate depth ratio (chrM:chr2) from mosdepth summary to check for consistency between runs
  • count kmers in HiFi reads using jellyfish, dump and export modimers

process_sample

  • launch once sample has been sequenced to sufficient depth
  • discover and call structural variants with pbsv
  • call small variants with DeepVariant
  • phase small variants with WhatsHap
  • merge per SMRT Cell BAMs and tag merged bam with haplotype based on WhatsHap phased DeepVariant variant calls
  • merge jellyfish kmer counts
  • assemble reads with hifiasm and calculate stats with calN50.js
  • align assembly to reference with minimap2
  • check for sample swaps by calculate consistency of kmers between sequencing runs

process_cohort

  • launched once all samples in cohort have been processed
  • if multi-sample cohort
    • jointly call structural variants with pbsv
    • jointly call small variants with GLnexus
  • using slivar
    • annotate variant calls with population frequency from gnomAD and HPRC variant databases
    • filter variant calls according to population frequency and inheritance patterns
    • detect possible compound heterozygotes, and filter to remove cis-combinations
    • assign a phenotype rank (Phrank) score, based on Jagadeesh KA, et al. 2019. Genet Med.

Dependencies

  • some tools (e.g. pbsv) require linux
  • conda
  • singularity >= 3.5.3 installed by root
  • environment.yaml

Configuration

  • config.yaml contains file paths and version numbers for docker images
  • reference.yaml contains file paths and names related to reference
  • *.cluster.yaml contains example cluster configuration for a slurm cluster with a compute queue for general compute and a ml queue for GPU.

About

Workflow for the comprehensive detection and prioritization of variants in human genomes with PacBio HiFi reads

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published