RNA Sequencing Pipeline

A Nextflow pipeline to perform quality control, alignment, and quantification of RNA sequencing data.

The pipeline was created to run on the ETH Euler cluster and it relies on the server's genome files. Thus, the pipeline needs to be adapted before running it in a different HPC cluster.

Pipeline steps

Required parameters

Path to the folder where the FASTQ files are located.

--input /cluster/work/nme/data/josousa/project/fastq/*fastq.gz

Output directory where the files will be saved.

--outdir /cluster/work/nme/data/josousa/project

Input optional parameters

Option to force the pipeline to assign input as single-end.

--single_end

By default, the pipeline detects whether the input files are single-end or paired-end.

Option to select RNA-Seq library strandness. This will only affect quantification.

--strandness 'smartseq2' # Default (same as 'unstranded')
--strandness 'forward'
--strandness 'reverse'
--strandness 'unstranded'

This option will only affect quantification.

Genomes

Reference genome used for alignment.

--genome

Available genomes:

    Mus_musculus_GRCm39 # Default
    Mus_musculus_GRCm38_p6
    Homo_sapiens_GRCh38_p14
    Rattus_norvegicus_mRatBN7_2
    Bos_taurus_ARS-UCD1_2
    Bos_taurus_ARS-UCD1_3
    Caenorhabditis_elegans_WBcel235
    Callithrix_jacchus_mCalJac1_pat_X
    Capra_hircus_ARS1
    Capreolus_capreolus_GCA_951849835_1
    Drosophila_melanogaster_BDGP6_46
    Escherichia_coli_ASM160652v1
    Macaca_fascicularis_Macaca_fascicularis_6_0
    Macaca_mulatta_Mmul_10
    Monodelphis_domestica_ASM229v1
    Pan_troglodytes_Pan_tro_3_0
    Saccharomyces_cerevisiae_R64-1-1
    Sus_scrofa_Sscrofa11_1

Option to use a custom genome for alignment by providing an absolute path to a custom genome file.

--custom_genome_file '/cluster/work/nme/data/josousa/project/genome/GRCm39.genome'

Example of a genome file:

name           GRCm39
species        Mouse
star           /cluster/work/nme/genomes/Mus_musculus/Ensembl/GRCm39/Sequence/STARIndex/
hisat2         /cluster/work/nme/genomes/Mus_musculus/Ensembl/GRCm39/Sequence/Hisat2Index/genome
hisat2_splices /cluster/work/nme/genomes/Mus_musculus/Ensembl/GRCm39/Sequence/Hisat2Index/splice_sites.txt
gtf            /cluster/work/nme/genomes/Mus_musculus/Ensembl/GRCm39/Annotation/Genes/genes.gtf

Aligner options

Option to choose the aligner.

--aligner 'star' # Default
--aligner 'hisat2'

HISAT2 parameters

Option to choose no soft-clipping.

--hisat2_no_softclip Default: true
Option to suppress unpaired alignments for paired reads

--hisat2_no_mixed Default: true
Option to suppress discordant alignments for paired reads.

--hisat2_no_discordant Default: true

FastQ Screen optional parameters

Option to provide a custom FastQ Screen config file.

--fastq_screen_conf '/cluster/work/nme/software/config/fastq_screen.conf' # Default

Option to pass the flag --bisulfite to FastQ Screen.

--bisulfite Default: false

featureCounts optional parameters

Option to only count read pairs that have both ends aligned.

--featurecounts_B_flag Default: true
Option to not count read pairs that have their two ends mapping to different chromosomes or mapping to same chromosome but on different strands.

--featurecounts_C_flag Default: true

Skipping options

Option to skip FastQC, TrimGalore, and FastQ Screen. The first step of the pipeline will be the Bismark alignment. --skip_qc
Option to skip FastQ Screen. --skip_fastq_screen
Option to skip quantification. --skip_quantification

Extra arguments

Option to add extra arguments to FastQC. --fastqc_args
Option to add extra arguments to FastQ Screen. --fastq_screen_args
Option to add extra arguments to Trim Galore. --trim_galore_args
Option to add extra arguments to the STAR aligner. --star_align_args
Option to add extra arguments to the HISAT2 aligner. --hisat2_align_args
Option to add extra arguments to Samtools sort. --samtools_sort_args
Option to add extra arguments to Samtools index. --samtools_index_args
Option to add extra arguments to featureCounts. --featurecounts_args
Option to add extra arguments to MultiQC. --multiqc_args

Acknowledgements

This pipeline was adapted from the Nextflow pipelines created by the Babraham Institute Bioinformatics Group and from the nf-core pipelines. We thank all the contributors for both projects. We also thank the Nextflow community and the nf-core community for all the help and support.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
genomes		genomes
modules		modules
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
nf_rnaseq		nf_rnaseq
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA Sequencing Pipeline

Pipeline steps

Required parameters

Input optional parameters

Genomes

Aligner options

HISAT2 parameters

FastQ Screen optional parameters

featureCounts optional parameters

Skipping options

Extra arguments

Acknowledgements

About

Releases

Packages

Languages

vonMeyennLab/nf_rnaseq

Folders and files

Latest commit

History

Repository files navigation

RNA Sequencing Pipeline

Pipeline steps

Required parameters

Input optional parameters

Genomes

Aligner options

HISAT2 parameters

FastQ Screen optional parameters

featureCounts optional parameters

Skipping options

Extra arguments

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages