-
Notifications
You must be signed in to change notification settings - Fork 3
02 Data Download
Neranjan Perera edited this page Dec 14, 2018
·
5 revisions
For this pipe line we are using NA12878 test sample where it contain SRX655430: Illumina random exon sequencing of genomic DNA paired-end library 'Pond-314378' containing sample 'NA12878'[link]
INPUT_FILES=(SRR1517848 SRR1517878 SRR1517884 SRR1517906 SRR1517991 SRR1518011 SRR1518158 SRR1518253) d1="raw_data" module load sratoolkit/2.8.2 if [ ! -d ../${d1} ]; then mkdir -p ../${d1} fi cd ../${d1} fastq-dump --split-files ${INPUT_FILE_NAME}
In this step we will download the paired-end interleveled fastq files from the NCBI SRA database. While downloading we will be splitting the fastq files into two fastq files; forward and reverse (R1 and R2)strand by using --split-files
command, which gives us:
raw_data/ ├── SRR1517848_1.fastq ├── SRR1517848_2.fastq ├── SRR1517878_1.fastq ├── SRR1517878_2.fastq ├── SRR1517884_1.fastq ├── SRR1517884_2.fastq ├── SRR1517906_1.fastq ├── SRR1517906_2.fastq ├── SRR1517991_1.fastq ├── SRR1517991_2.fastq ├── SRR1518011_1.fastq ├── SRR1518011_2.fastq ├── SRR1518158_1.fastq ├── SRR1518158_2.fastq ├── SRR1518253_1.fastq └── SRR1518253_2.fastq
d1="raw_data" cd ../${d1} module load sratoolkit/2.8.2 list="SRR796868 SRR796869 SRR796870 SRR796871 SRR796872 SRR796873 SRR796874 SRR796875 SRR796876 SRR796877 SRR796878 SRR796879 SRR796880 SRR796881 SRX265476" for file in $list; do echo $file fastq-dump --split-files $file done