These two custom scripts are designed for the analysis of data generated through ONT direct RNA sequencing.
The first script, script1.sh, processes reads stored in FASTQ files using a provided reference transcript file. The script must be executed in a directory containing only the relevant input files. The run command is as follows:
./script1.sh transcript_file.fasta
This script requires the following tools:
NanoPlot for quality control (https://github.com/wdecoster/NanoPlot) minimap2 for read alignment (https://github.com/lh3/minimap2) samtools for SAM/BAM file processing (https://github.com/samtools/samtools) NanoCount for transcript quantification (https://github.com/a-slide/NanoCount)
Read Alignment: Reads are aligned to the reference transcripts using minimap2. File Conversion: The resulting SAM file is converted into a sorted and indexed BAM file using samtools. Transcript Quantification: The alignment is analyzed with NanoCount, which counts transcripts and identifies genes through a series of bash commands. Quality Control: NanoPlot automatically assesses the quality of each read file; however, no quality threshold is applied, and all reads are included in the analysis.
The script produces a results table named Results.csv with the following columns:
Column 1: Name of the read file Column 2: Number of identified transcripts Column 3: Number of identified genes
The second script, script2.sh, compares transcript data between two user-provided files generated by script1.sh with the .gene.txt extension. The run command is as follows:
./script2.sh groupA.gene.txt groupB.gene.txt
This script generates three output files:
Transcripts unique to the first group (groupA). Transcripts unique to the second group (groupB). Transcripts common to both groups.
Currently, both scripts are designed to work exclusively with reference transcript files downloaded from FlyBase.