Skip to content

02 Data Download

Neranjan Perera edited this page Dec 14, 2018 · 5 revisions

For this pipe line we are using NA12878 test sample where it contain SRX655430: Illumina random exon sequencing of genomic DNA paired-end library 'Pond-314378' containing sample 'NA12878'[link]

INPUT_FILES=(SRR1517848 SRR1517878 SRR1517884 SRR1517906 SRR1517991 SRR1518011 SRR1518158 SRR1518253)

d1="raw_data"

module load sratoolkit/2.8.2

if [ ! -d ../${d1} ]; then
        mkdir -p ../${d1}
fi

cd ../${d1}

fastq-dump --split-files ${INPUT_FILE_NAME}

In this step we will download the paired-end interleveled fastq files from the NCBI SRA database. While downloading we will be splitting the fastq files into two fastq files; forward and reverse (R1 and R2)strand by using --split-files command, which gives us:

raw_data/
├── SRR1517848_1.fastq
├── SRR1517848_2.fastq
├── SRR1517878_1.fastq
├── SRR1517878_2.fastq
├── SRR1517884_1.fastq
├── SRR1517884_2.fastq
├── SRR1517906_1.fastq
├── SRR1517906_2.fastq
├── SRR1517991_1.fastq
├── SRR1517991_2.fastq
├── SRR1518011_1.fastq
├── SRR1518011_2.fastq
├── SRR1518158_1.fastq
├── SRR1518158_2.fastq
├── SRR1518253_1.fastq
└── SRR1518253_2.fastq
d1="raw_data"

cd ../${d1}

module load sratoolkit/2.8.2

list="SRR796868 SRR796869 SRR796870 SRR796871 SRR796872 
SRR796873 SRR796874 SRR796875 SRR796876 SRR796877 
SRR796878 SRR796879 SRR796880 SRR796881 SRX265476"

for file in $list; do
        echo $file
        fastq-dump --split-files $file
done