Skip to content

This repository hosts a snakemake workflow for basic processing of whole-genome sequencing reads from cell-free DNA.

Notifications You must be signed in to change notification settings

rtsundby/cfdna-wgs

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This repository hosts a snakemake workflow for basic processing of whole-genome sequencing reads from cell-free DNA.

img

Organization

Master branch of the repository contains most recent developments while stable versions are saved as terminal branches (e.g. stable.1.0.0).

Directory workflow contains two types of workflows- process-focused snakefiles (reads.smk, cna.smk, frag.smk) suitable for integration into another snakemake pipeline using the :include command, and the _int_test snakefile with examples of such integration using the repository test data.

Use

Changelog

  • [2023-01-26 Thu] - Version 9.1.0: Repo cleanup
  • [2023-01-26 Thu] - Version 9.0.0: Removed -f 3 flag for perfectly matched pairs in samtools filtering as the flag from BWA removes some fragments at a set max length. Added framework for benchmark analysis. Added conditional execution of downsampling. Removed (temporarily) final wig and ichor commands of CNA as these don't currently run correctly without full genome alignment, so can't be validated on test data. Added local documentation of cfdna-wgs dockerfile.
  • [2023-01-21 Sat] - Version 8.0.0: Corrected rule filt_bam_to_frag_bed to fix mates of inputs, which seems to prevent errors in the bamtobed call. Frag_window_count now uses windows of consistent 5 Mb size, which are generated from rule make_gc_map_bind. Added a merged fragment counts file and zero-centered unit SD counts.
  • [2022-12-07 Wed] - Version 7.0.0: Added copy number alteration and DELFI fragmentomics.
  • [2022-10-17 Mon] - Version 6.0.0: Using fastp for read trimming (replaces trimmomatic). Simplified naming schema. Removed downsampling (will reinstate in later version).
  • [2022-09-08 Thu] - Version 5.3.0: some minor name changes
  • [2022-08-19 Fri] - Version 5.2.0 validated: Adds bamCoverage and plotCoverage from deeptools. Benchmarks BWA.
  • [2022-08-09 Tue] - Version 5.1.0 validated: Added cfdna wgs-specific container for each rule, referenced to config
  • [2022-08-05 Fri] - Version 5.0.0 validated: Added a symlink rule based on python dictionary. Added repo-specific output naming, added checks for sequence type and file readability to input tsv.
  • [2022-06-27 Mon] - Version 4 validated. Further expanded read_qc.tsv table. Removed bam post-processing step and added a more expansive bam filtering step. Updated downsampling to work off filtered alignments.
  • [2022-06-26 Sun] - Version 3.2 validated. Expanded the qc aggregate table and added some comments.
  • [2022-06-24 Fri] - Validate version 3.1 which includes genome index build as a snakefile rule.
  • [2022-06-24 Fri] - Validated version 3 with read number checkpoint for down-sampling.
  • [2022-05-31 Tue] - Conforms to current biotools best practices.
  • [2022-04-29 Fri] - Moved multiqc to integration testing as inputs are dependent on final sample labels. Integration testing works per this commit.

About

This repository hosts a snakemake workflow for basic processing of whole-genome sequencing reads from cell-free DNA.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 99.4%
  • Other 0.6%