- Author: Byoungnam Min
- Last updated: 2023-10-27
It is an example pipeline for tumor detection from DNA sequencing. The pipeline consists of:
- Dockerfile
- WDL (Workflow Description Language)
- Python scripts used for running programs
run_fastp.py
(for read QC)run_bwa.py
(for read alignment)run_mutect2.py
(for finding tumor-related mutations)
This test was performed on Ubuntu 22.04
# Docker
sudo apt-get update
sudo apt-get remove docker docker-engine docker.io
sudo apt-get install -y docker.io
sudo systemctl start docker
sudo systemctl enable docker
# Cromwell
wget https://github.com/broadinstitute/cromwell/releases/download/86/cromwell-86.jar
git clone https://github.com/mbnmbn00/tumor_detect_wdl.git
cd tumor_detect_wdl/scripts
docker build --platform linux/amd64 --tag tumor_img .
We will use mice tumor samples as reported in McCreery et al, 2015. For the purpose of speed, we will focus on chromosome 7 where known mutations are reported.
# Mice genome GRCm39 chromosome 7
# https://www.ncbi.nlm.nih.gov/nuccore/CM001000.3/
export NCBI_SEQ_ID=CM001000.3
python3 tumor_detect_wdl/scripts/download_ncbi_seq.py \
--ncbi_seq_id ${NCBI_SEQ_ID} \
--output_file chromosome_7.fasta
We will use only one sample for testing, but you can download the full list of samples from ENA.
curl --remote-name ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR100/005/ERR1008135/ERR1008135_1.fastq.gz
curl --remote-name ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR100/005/ERR1008135/ERR1008135_2.fastq.gz
java -jar cromwell.jar run mutect2_pipeline.wdl --i my_inputs.json