title: "RNAseq Kallisto(bash) + sleuth(R)" author: 'Marius: [email protected]' date: "07/06/2019" output: pdf_document: default html_notebook: default html_document: df_print: paged
- Will need to perform different steps to achieve the process:
- Indexing
- Quantification
- DEG analysis
For starting we will need to create the Index for the pseudoallignment. It is required the cDNA of the organism used.
Make sure to download all in the same folder.
Download the cDNA file from ensembl and copy the scripts to your folder from github.
cd ~/Desktop
mkdir working_with_kallisto
cd working_with_kallisto
wget --no-verbose ftp://ftp.ensembl.org/pub/release-96/fasta/danio_rerio/cdna/Danio_rerio.GRCz11.cdna.all.fa.gz
git clone
Continue indexing the cDNA using kallisto.
cd ~/Desktop/working_with_kallisto
kallisto index --make-unique --index Danio_rerio_ensembl_cdna_fa_GRCz11.idx Danio_rerio.GRCz11.cdna.all.fa.gz
Index will be obtained with the name Danio_rerio_ensembl_cdna_fa_GRCz11.idx now move to the next step.
QuantificationFor this step first crete the files required for the scrip to run them with parallel cores.
1.Ordered list of samples and paths to the samples. Position in the folder where your samples are and run the following.
cd ~/Desktop/working_with_kallisto
echo "$(ls $PWD/Sample_*/*.gz | sort -V)" >> fastq_files_paths.txt
2.Add the number of sample at the begginig of the file to recognize the files.
cd ~/Desktop/working_with_kallisto
awk 'BEGIN{count=1; OFS="\t"}{print "sample"count++, $0}' fastq_files_paths.txt >> fastq_files_paths_samples.txt
- With this 2 new files will be obtained:
- fastq_files_paths.txt Paths to all the fasrtq.gz of all the samples
- fastq_files_paths_samples.txt Paths to all the fasrtq.gz of all the samples and the sample number as reference
It will be used fastq_files_paths_samples.txt to determine the path and launch the command for running the data with multi-core.
Run the command maker for using kallisto in parallel.
cd ~/Desktop/working_with_kallisto
bash kallisto_test.sh fastq_files_paths_samples.txt job_list_for_parallel.txt
- It will perform the following:
1.[bash kallisto_test.sh] → runs the script for creating the multiple jobs.
2.[fastq_files_paths_samples.txt] → the file previously generated with the paths to the samples.
3.[job_list_for_parallel.txt] → output of the script.
- bash kallisto_test.sh → create a new file job_list_for_parallel.txt with the first argument received the fastq_files_paths_samples.txt.
For completing the first part run the kallisto quantification in parallel in your computer for half of the processors in your computer (recommended)
Run the last script and obtain the result in different folders called as samples1..n
Notice 2>&1 | tee final report.txt
is an extra report of all the output redirected to final_report.txt
cd ~/Desktop/working_with_kallisto
parallel --progress --jobs 4 --joblog kallisto_joblog.txt < job_list_for_parallel.txt 2>&1 | tee final_report.txt
Run sleuth in R following the codes.
"TODO"