Using k-mer deconvolution for rapid clustering of sequences into highly similar group and collapsing each group to tRNA families.
The metadata file resides in the working directory and lists the required information for each sample. For example:
Sample.name | R1 | R2 |
---|---|---|
Ecoli | 05_E-coli_S5_L001_R1_001.fastq.gz | 05_E-coli_S5_L001_R2_001.fastq.gz |
Efaecalis | 07_E-faecalis_S7_L001_R1_001.fastq.gz | 07_E-faecalis_S7_L001_R2_001.fastq.gz |
Etarda | 06_E-tarda_S6_L001_R1_001.fastq.gz | 06_E-tarda_S6_L001_R2_001.fastq.gz |
PWD=path to the working directory(required)
python main.py $PWD -p -mk -lc -cls -th 20
Run python main.py
for usage.
The following are the options:
-p
or--process_fastqs
: preprocessing of raw fastq files : UMI extraction, adapter removal, mergeing and quality filtering-mk
or--make_kmer_distance_mat
: making the kmer distance matrix for the samples-lc
or--leiden_clustering
: clustering reads based on the kmer space of samples-cls
or--Collabse_clusters
: Collabsing each read clusters to further simialr groups-th
or--Number_of_Threads
: Number of cpu threads to use for clustering and collabsing ( default : 4)