You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am testing the TSS using real world bacteria genomes. There are 8 of them and all are some what fragmented (e.g.,1-30 fragments). This is a great example to test whether TSS is robust to global mutation because we do not get complete circular genomes from our sequencing experiment of each bacterial isolate. They are eight Shewanella Baltica species, average nucleotide identity (https://www.microbiologyresearch.org/content/journal/ijsem/10.1099/ijsem.0.000760?crawler=true) among them is very high, 95%-99%, meaning they are very similar. I want to test how TSS will perform for close related sequences and global mutation compare to kmer based method for close related sequences. I have very high spearman rank correlation (1 actually) coefficiency for kmer based distance and ANI, but very bad for TSS distance with ANI. I attached the figure for TSS vs ANI and all the concatenated Shewanella Baltica genome (so in one piece) for you to test.
The command I use for running TSS is:
./sketch -i S_Baltica_new -m TSS -f fasta -o S_Baltica_new_TSS_triangle
It seems TSS varies a lot even ANI/kmer based method is quite consistent from the figure. Do you have any explanations for this. Will TSS lose resolution for close related sequences (where kmer based works very well) but only works for divergently related ones? If so, how to benefit from both?
I can make two general comments about your results.
TSS outperforms other sketching methods for distantly related genomes, as opposed to very similar pairs, which isn't the case for very high ANI
The TSS distance is not a linear function of ANI, therefore, difference in TSS distance do not reflect a similar drop/increase in ANI.
You can increase the sketch dimension to --embed_dim=30 to get more accurate sketches. I've attached the results I get with this modification S_Baltica_new_TSS_triangle.zip
That being said, if you can share the script to reproduce the figure, I might be able to add non-linear transformation that will suit your problem.
See the attached files. I plot your TSS distance with ANI but still not a monotonic (I was not expecting linear) relationship. Still bad for similar genomes.Maybe even larger embed dimension?
Dear TSS team,
I am testing the TSS using real world bacteria genomes. There are 8 of them and all are some what fragmented (e.g.,1-30 fragments). This is a great example to test whether TSS is robust to global mutation because we do not get complete circular genomes from our sequencing experiment of each bacterial isolate. They are eight Shewanella Baltica species, average nucleotide identity (https://www.microbiologyresearch.org/content/journal/ijsem/10.1099/ijsem.0.000760?crawler=true) among them is very high, 95%-99%, meaning they are very similar. I want to test how TSS will perform for close related sequences and global mutation compare to kmer based method for close related sequences. I have very high spearman rank correlation (1 actually) coefficiency for kmer based distance and ANI, but very bad for TSS distance with ANI. I attached the figure for TSS vs ANI and all the concatenated Shewanella Baltica genome (so in one piece) for you to test.
The command I use for running TSS is:
./sketch -i S_Baltica_new -m TSS -f fasta -o S_Baltica_new_TSS_triangle
It seems TSS varies a lot even ANI/kmer based method is quite consistent from the figure. Do you have any explanations for this. Will TSS lose resolution for close related sequences (where kmer based works very well) but only works for divergently related ones? If so, how to benefit from both?
Thanks,
Jianshu
S_Baltica_new.zip
ANI_TSS_S_Baltica.pdf.zip
The text was updated successfully, but these errors were encountered: