Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joining nearby NAMs for combined score #415

Open
ksahlin opened this issue Mar 31, 2024 · 1 comment
Open

Joining nearby NAMs for combined score #415

ksahlin opened this issue Mar 31, 2024 · 1 comment

Comments

@ksahlin
Copy link
Owner

ksahlin commented Mar 31, 2024

The idea of joining nearby NAMs for increased score/power and to reduce redundant alignments seem logical. However, it hasn’t worked as effectively as expected. I found one reason, that I describe in this comment: 3d96321#commitcomment-140411641

Even after fixing as above, the idea is still not giving a great benefit on a smaller benchmark, but there could be remaining issues with the logic, or the testing I did is not representative of what happens on the main branch implementation (described in comment).

@ksahlin
Copy link
Owner Author

ksahlin commented Mar 31, 2024

In the above comment, I had benchmarked with asymmetric seeds and concluded no significant advantage. Asymmetric seeds currently have other issues, which doesn't give a clean benchmark.

Therefore, I decided to implement the above described 'slack' based off the main branch (which uses symmetric seeds).

Below I provide a benchmark on a small chrX+Y dataset using strobealign-symmetric (current main) vs strobealign-symmetric-no-rescue-slack which:

  1. enters find_NAMs_rescue() only rescue if there is no NAM. That is if (nams.empty()) instead of if (nams.empty() || nonrepetitive_fraction < 0.7)
  2. sets slack = read_length/3.

In single end mode I also include strobealign-symmetric-no-rescuewith only includes the no-rescue 'feature'.

I observe quite drastic accuracy improvements on all but the 50nt reads with the strobealign-symmetric-no-rescue-slack . In addition, the no-rescue version is always faster.

It should be carefully noted that this is a small dataset. The rescue might do a better job on a full human genome(?).

PAIRED END READS

tool,ref,%-aligned,accuracy,time(sec),Mem(MB)
chm13_chrX_Y,50,strobealign-symmetric,align,99.929,**78.192**,0,14.09,1192.62
chm13_chrX_Y,50,strobealign-symmetric-no-rescue-slack,align,99.9245,78.1155,0,7.62,1192.88

chm13_chrX_Y,75,strobealign-symmetric,align,100.0,79.8775,0,12.2,1193.8
chm13_chrX_Y,75,strobealign-symmetric-no-rescue-slack,align,99.9995,**79.979**,0,6.21,1194.99

chm13_chrX_Y,125,strobealign-symmetric,align,100.0,82.0875,0,11.99,1196.45
chm13_chrX_Y,125,strobealign-symmetric-no-rescue-slack,align,100.0,82.085,0,7.99,1201.92

chm13_chrX_Y,150,strobealign-symmetric,align,100.0,82.7435,0,13.02,1198.98
chm13_chrX_Y,150,strobealign-symmetric-no-rescue-slack,align,100.0,**82.891**,0,10.02,1205.21

chm13_chrX_Y,250,strobealign-symmetric,align,99.999,85.882,0,20.06,1209.32
chm13_chrX_Y,250,strobealign-symmetric-no-rescue-slack,align,99.999,**86.1335**,0,19.2,1220.66

chm13_chrX_Y,500,strobealign-symmetric,align,99.998,87.9965,0,37.28,969.72
chm13_chrX_Y,500,strobealign-symmetric-no-rescue-slack,align,99.998,**88.4225**,0,32.64,987.56

SINGLE END READS

tool,ref,%-aligned,accuracy,time(sec),Mem(MB)
chm13_chrX_Y,50,strobealign-symmetric_SE,align,48.9205,**36.306**,0,3.31,1187.66
chm13_chrX_Y,50,strobealign-symmetric-no-rescue_SE,align,48.9205,36.2685,0,1.75,1187.69
chm13_chrX_Y,50,strobealign-symmetric-no-rescue-slack_SE,align,48.9205,36.2665,0,1.72,1187.57

chm13_chrX_Y,75,strobealign-symmetric_SE,align,49.8825,38.312,0,3.31,1187.1
chm13_chrX_Y,75,strobealign-symmetric-no-rescue_SE,align,49.8825,**38.3335**,0,1.74,1188.0
chm13_chrX_Y,75,strobealign-symmetric-no-rescue-slack_SE,align,49.8825,38.323,0,1.74,1188.52

chm13_chrX_Y,125,strobealign-symmetric_SE,align,49.9745,39.5835,0,3.72,1190.74
chm13_chrX_Y,125,strobealign-symmetric-no-rescue_SE,align,49.9745,39.5495,0,1.99,1190.32
chm13_chrX_Y,125,strobealign-symmetric-no-rescue-slack_SE,align,49.9745,**39.632**,0,2.02,1190.86

chm13_chrX_Y,150,strobealign-symmetric_SE,align,49.995,39.985,0,4.57,1192.77
chm13_chrX_Y,150,strobealign-symmetric-no-rescue_SE,align,49.995,39.9595,0,2.28,1195.56
chm13_chrX_Y,150,strobealign-symmetric-no-rescue-slack_SE,align,49.995,**40.004**,0,2.39,1195.71

chm13_chrX_Y,250,strobealign-symmetric_SE,align,49.999,41.1665,0,6.74,1197.88
chm13_chrX_Y,250,strobealign-symmetric-no-rescue_SE,align,49.999,41.2015,0,4.49,1210.02
chm13_chrX_Y,250,strobealign-symmetric-no-rescue-slack_SE,align,49.999,**41.2145**,0,4.81,1204.21

chm13_chrX_Y,500,strobealign-symmetric_SE,align,49.999,42.716,0,17.1,943.99
chm13_chrX_Y,500,strobealign-symmetric-no-rescue_SE,align,49.999,**42.8755**,0,12.23,945.01
chm13_chrX_Y,500,strobealign-symmetric-no-rescue-slack_SE,align,49.999,42.8645,0,12.53,958.21

(CC @Itolstoganov)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant