Skip to content

Latest commit

 

History

History
56 lines (39 loc) · 2.53 KB

File metadata and controls

56 lines (39 loc) · 2.53 KB

Data pre-processing for variant discovery GATK4

An HPC workflow that pre-processes 50 matched tumor-normal whole genome sequencing (WGS) fastq files from 25 childhood acute lymphoblastic leukemia cases. WGS data were from Illumina NovaSeq 6000 Sequencing. Check out gatk doc for more details.

This workflow was written according to the following GATK official WDLs:

The main steps in the preprocessing workflow. A figure from GITC.

Step 0: QC

Step 1: Fastqs to Unmapped BAMs

Step 2: Unmapped BAMs to Mapped BAMs

Step 3: Merge Unmapped BAMs and Mapped BAMs

Step 4: MarkDuplicates

Step 5: Base Quality Score Recalibration