Skip to content

07 Remove PCR Duplicates using PICARD

Neranjan Perera edited this page Dec 6, 2018 · 2 revisions

During the sequencing the same DNA molecules can be sequenced multiple times resulting in duplicates. These reads should not be counted as information in variant detection. In this step we will mark the duplicate reads and will remove them.

Following command will remove the duplicate reads from each sample file.

module load picard/2.9.2
export _JAVA_OPTIONS=-Djava.io.tmpdir=/scratch

cd ../${d3}/

java -jar $PICARD MarkDuplicates \
        INPUT=../${d2}/${INPUT_FILE_NAME}_filtered_sort.bam \
        OUTPUT=${INPUT_FILE_NAME}_nodup.bam \
        REMOVE_DUPLICATES=Ture \
        METRICS_FILE=${INPUT_FILE_NAME}_metrics.txt \
        CREATE_INDEX=True

This will result in duplicates removed BAM files which will be:

noduplicates/
├── SRR1517848_metrics.txt
├── SRR1517848_nodup.bai
├── SRR1517848_nodup.bam
├── SRR1517878_metrics.txt
├── SRR1517878_nodup.bai
├── SRR1517878_nodup.bam
├── SRR1517884_metrics.txt
├── SRR1517884_nodup.bai
├── SRR1517884_nodup.bam
├── SRR1517906_metrics.txt
├── SRR1517906_nodup.bai
├── SRR1517906_nodup.bam
├── SRR1517991_metrics.txt
├── SRR1517991_nodup.bai
├── SRR1517991_nodup.bam
├── SRR1518011_metrics.txt
├── SRR1518011_nodup.bai
├── SRR1518011_nodup.bam
├── SRR1518158_metrics.txt
├── SRR1518158_nodup.bai
├── SRR1518158_nodup.bam
├── SRR1518253_metrics.txt
├── SRR1518253_nodup.bai
└── SRR1518253_nodup.bam