A Nextflow pipeline to perform quality control, alignment, and quantification of RNA sequencing data.
The pipeline was created to run on the ETH Euler cluster and it relies on the server's genome files. Thus, the pipeline needs to be adapted before running it in a different HPC cluster.
- FastQC
- FastQ Screen
- Trim Galore
- FastQC
- STAR or HISAT2
- Samtools sort
- Samtools index
- featureCounts
- MultiQC
Path to the folder where the FASTQ files are located.
--input /cluster/work/nme/data/josousa/project/fastq/*fastq.gz
Output directory where the files will be saved.
--outdir /cluster/work/nme/data/josousa/project
-
Option to force the pipeline to assign input as single-end.
--single_end
By default, the pipeline detects whether the input files are single-end or paired-end.
-
Option to select RNA-Seq library strandness. This will only affect quantification.
--strandness 'smartseq2' # Default (same as 'unstranded') --strandness 'forward' --strandness 'reverse' --strandness 'unstranded'
This option will only affect quantification.
-
Reference genome used for alignment.
--genome
Available genomes:
Mus_musculus_GRCm39 # Default Mus_musculus_GRCm38_p6 Homo_sapiens_GRCh38_p14 Rattus_norvegicus_mRatBN7_2 Bos_taurus_ARS-UCD1_2 Bos_taurus_ARS-UCD1_3 Caenorhabditis_elegans_WBcel235 Callithrix_jacchus_mCalJac1_pat_X Capra_hircus_ARS1 Capreolus_capreolus_GCA_951849835_1 Drosophila_melanogaster_BDGP6_46 Escherichia_coli_ASM160652v1 Macaca_fascicularis_Macaca_fascicularis_6_0 Macaca_mulatta_Mmul_10 Monodelphis_domestica_ASM229v1 Pan_troglodytes_Pan_tro_3_0 Saccharomyces_cerevisiae_R64-1-1 Sus_scrofa_Sscrofa11_1
-
Option to use a custom genome for alignment by providing an absolute path to a custom genome file.
--custom_genome_file '/cluster/work/nme/data/josousa/project/genome/GRCm39.genome'
Example of a genome file:
name GRCm39 species Mouse star /cluster/work/nme/genomes/Mus_musculus/Ensembl/GRCm39/Sequence/STARIndex/ hisat2 /cluster/work/nme/genomes/Mus_musculus/Ensembl/GRCm39/Sequence/Hisat2Index/genome hisat2_splices /cluster/work/nme/genomes/Mus_musculus/Ensembl/GRCm39/Sequence/Hisat2Index/splice_sites.txt gtf /cluster/work/nme/genomes/Mus_musculus/Ensembl/GRCm39/Annotation/Genes/genes.gtf
- Option to choose the aligner.
--aligner 'star' # Default --aligner 'hisat2'
-
Option to choose no soft-clipping.
--hisat2_no_softclip
Default: true -
Option to suppress unpaired alignments for paired reads
--hisat2_no_mixed
Default: true -
Option to suppress discordant alignments for paired reads.
--hisat2_no_discordant
Default: true
-
Option to provide a custom FastQ Screen config file.
--fastq_screen_conf '/cluster/work/nme/software/config/fastq_screen.conf' # Default
-
Option to pass the flag --bisulfite to FastQ Screen.
--bisulfite
Default: false
-
Option to only count read pairs that have both ends aligned.
--featurecounts_B_flag
Default: true -
Option to not count read pairs that have their two ends mapping to different chromosomes or mapping to same chromosome but on different strands.
--featurecounts_C_flag
Default: true
-
Option to skip FastQC, TrimGalore, and FastQ Screen. The first step of the pipeline will be the Bismark alignment.
--skip_qc
-
Option to skip FastQ Screen.
--skip_fastq_screen
-
Option to skip quantification.
--skip_quantification
-
Option to add extra arguments to FastQC.
--fastqc_args
-
Option to add extra arguments to FastQ Screen.
--fastq_screen_args
-
Option to add extra arguments to Trim Galore.
--trim_galore_args
-
Option to add extra arguments to the STAR aligner.
--star_align_args
-
Option to add extra arguments to the HISAT2 aligner.
--hisat2_align_args
-
Option to add extra arguments to Samtools sort.
--samtools_sort_args
-
Option to add extra arguments to Samtools index.
--samtools_index_args
-
Option to add extra arguments to featureCounts.
--featurecounts_args
-
Option to add extra arguments to MultiQC.
--multiqc_args
This pipeline was adapted from the Nextflow pipelines created by the Babraham Institute Bioinformatics Group and from the nf-core pipelines. We thank all the contributors for both projects. We also thank the Nextflow community and the nf-core community for all the help and support.