Dependencies of the scripts are listed in the requirements.txt file.
Those python dependencies may be installed by running the following pip
command.
pip install -r requirements.txt
This tool uses cd-hit-est
for clustering nucleotide sequences, so also make sure cd-hit
is installed.
cd-hit
is e.g. available from bioconda, or the debian package manager.
Move all RT sample files you want to analyze to a folder.
Extract sequences:
./readtag.py <INPUT_FOLDER_RT> <OUTPUT_FOLDER_RT>
This will generate the output folder.
cd <OUTPUT_FOLDER_RT>
grep -v '#' 341-RT-T1_S341_L001.15N.tsv | awk '{OFS="\t"; print ">"$1"\n"$2}' > 341-RT-T1_S341_L001.15N.fasta
cd <OUTPUT_FOLDER_RT>
cd-hit-est -d 0 -i 341-RT-T1_S341_L001.15N.fasta -o 341-RT-T1_S341_L001.15N.cdhit
./parse_cdhit_clusters.py <OUTPUT_FOLDER_RT>/341-RT-T1_S341_L001.15N.tsv <OUTPUT_FOLDER_RT>/341-RT-T1_S341_L001.15N.cdhit <OUTPUT_FOLDER_RT>/341-RT-T1_S341_L001.15N.cdhit.clstr
Move all LT sample files you want to analyze to a folder.
./linktag.py <INPUT_FOLDER_LT> <OUTPUT_FOLDER_LT>
./aggregate.py <OUTPUT_FOLDER_LT>/337-LT-T1_S337_L001.15Npairs.tsv <INPUT_FOLDER_RT> <OUTPUT_FOLDER_RT>/341-RT-T1_S341_L001.15N.clusters.tsv <OUTPUT_FOLDER_AGGREGATE>