This repository hosts a snakemake workflow for basic processing of whole-genome sequencing reads from cell-free DNA.
Master branch of the repository contains most recent developments while stable versions are saved as terminal branches (e.g. stable.1.0.0).
Directory workflow
contains two types of workflows- process-focused snakefiles (reads.smk, cna.smk, frag.smk) suitable for integration into another snakemake pipeline using the :include command, and the _int_test snakefile with examples of such integration using the repository test data.
- All software needed for the pipeline is present within the associated docker container (see
docker
and https://hub.docker.com/repository/docker/jeszyman/cfdna_wgs/general). - See the example configuration yaml
config/int_test.yaml
and wrapper workflowworkflow/int_test.smk
for necessary run conditions.
- [2023-01-26 Thu] - Version 9.1.0: Repo cleanup
- [2023-01-26 Thu] - Version 9.0.0: Removed -f 3 flag for perfectly matched pairs in samtools filtering as the flag from BWA removes some fragments at a set max length. Added framework for benchmark analysis. Added conditional execution of downsampling. Removed (temporarily) final wig and ichor commands of CNA as these don't currently run correctly without full genome alignment, so can't be validated on test data. Added local documentation of cfdna-wgs dockerfile.
- [2023-01-21 Sat] - Version 8.0.0: Corrected rule filt_bam_to_frag_bed to fix mates of inputs, which seems to prevent errors in the bamtobed call. Frag_window_count now uses windows of consistent 5 Mb size, which are generated from rule make_gc_map_bind. Added a merged fragment counts file and zero-centered unit SD counts.
- [2022-12-07 Wed] - Version 7.0.0: Added copy number alteration and DELFI fragmentomics.
- [2022-10-17 Mon] - Version 6.0.0: Using fastp for read trimming (replaces trimmomatic). Simplified naming schema. Removed downsampling (will reinstate in later version).
- [2022-09-08 Thu] - Version 5.3.0: some minor name changes
- [2022-08-19 Fri] - Version 5.2.0 validated: Adds bamCoverage and plotCoverage from deeptools. Benchmarks BWA.
- [2022-08-09 Tue] - Version 5.1.0 validated: Added cfdna wgs-specific container for each rule, referenced to config
- [2022-08-05 Fri] - Version 5.0.0 validated: Added a symlink rule based on python dictionary. Added repo-specific output naming, added checks for sequence type and file readability to input tsv.
- [2022-06-27 Mon] - Version 4 validated. Further expanded read_qc.tsv table. Removed bam post-processing step and added a more expansive bam filtering step. Updated downsampling to work off filtered alignments.
- [2022-06-26 Sun] - Version 3.2 validated. Expanded the qc aggregate table and added some comments.
- [2022-06-24 Fri] - Validate version 3.1 which includes genome index build as a snakefile rule.
- [2022-06-24 Fri] - Validated version 3 with read number checkpoint for down-sampling.
- [2022-05-31 Tue] - Conforms to current biotools best practices.
- [2022-04-29 Fri] - Moved multiqc to integration testing as inputs are dependent on final sample labels. Integration testing works per this commit.