strange TSS distance for very similar genomic identity sequences #50

jianshu93 · 2022-01-04T18:28:26Z

Dear TSS team,

I am testing the TSS using real world bacteria genomes. There are 8 of them and all are some what fragmented (e.g.,1-30 fragments). This is a great example to test whether TSS is robust to global mutation because we do not get complete circular genomes from our sequencing experiment of each bacterial isolate. They are eight Shewanella Baltica species, average nucleotide identity (https://www.microbiologyresearch.org/content/journal/ijsem/10.1099/ijsem.0.000760?crawler=true) among them is very high, 95%-99%, meaning they are very similar. I want to test how TSS will perform for close related sequences and global mutation compare to kmer based method for close related sequences. I have very high spearman rank correlation (1 actually) coefficiency for kmer based distance and ANI, but very bad for TSS distance with ANI. I attached the figure for TSS vs ANI and all the concatenated Shewanella Baltica genome (so in one piece) for you to test.

The command I use for running TSS is:

./sketch -i S_Baltica_new -m TSS -f fasta -o S_Baltica_new_TSS_triangle

It seems TSS varies a lot even ANI/kmer based method is quite consistent from the figure. Do you have any explanations for this. Will TSS lose resolution for close related sequences (where kmer based works very well) but only works for divergently related ones? If so, how to benefit from both?

Thanks,

Jianshu

S_Baltica_new.zip

ANI_TSS_S_Baltica.pdf.zip

ajoudaki · 2022-01-14T17:39:39Z

Dear Jianshu,

I can make two general comments about your results.

TSS outperforms other sketching methods for distantly related genomes, as opposed to very similar pairs, which isn't the case for very high ANI
The TSS distance is not a linear function of ANI, therefore, difference in TSS distance do not reflect a similar drop/increase in ANI.
You can increase the sketch dimension to --embed_dim=30 to get more accurate sketches. I've attached the results I get with this modification S_Baltica_new_TSS_triangle.zip

That being said, if you can share the script to reproduce the figure, I might be able to add non-linear transformation that will suit your problem.

jianshu93 · 2022-01-14T18:58:09Z

Hello Amir,

See the attached files. I plot your TSS distance with ANI but still not a monotonic (I was not expecting linear) relationship. Still bad for similar genomes.Maybe even larger embed dimension?

Archive.zip

Thanks,

Jianshu

jianshu93 · 2022-02-27T05:52:58Z

Hello team,

Any update on the question I mentioned above, when there are closely related and distantly related genomes at the same time?

Thanks,

Jianshu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strange TSS distance for very similar genomic identity sequences #50

strange TSS distance for very similar genomic identity sequences #50

jianshu93 commented Jan 4, 2022

ajoudaki commented Jan 14, 2022 •

edited

Loading

jianshu93 commented Jan 14, 2022

jianshu93 commented Feb 27, 2022

strange TSS distance for very similar genomic identity sequences #50

strange TSS distance for very similar genomic identity sequences #50

Comments

jianshu93 commented Jan 4, 2022

ajoudaki commented Jan 14, 2022 • edited Loading

jianshu93 commented Jan 14, 2022

jianshu93 commented Feb 27, 2022

ajoudaki commented Jan 14, 2022 •

edited

Loading