Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed syncmers with lower density #429

Open
Daniel-Liu-c0deb0t opened this issue Jun 15, 2024 · 1 comment
Open

Closed syncmers with lower density #429

Daniel-Liu-c0deb0t opened this issue Jun 15, 2024 · 1 comment

Comments

@Daniel-Liu-c0deb0t
Copy link

Hey it's me again :) I briefly corresponded with @ksahlin on a refined version of closed syncmers with 25-30% lower density for realistic values of k. This might be useful for strobealign.

optimal_closed_syncmers

My code for generating these asymptotically optimal density closed syncmers is here: https://github.com/Daniel-Liu-c0deb0t/dlb-kmer-sampling/blob/main/src/lib.rs#L30 and I can explain the algorithm in more detail if needed.

@ksahlin
Copy link
Owner

ksahlin commented Jun 15, 2024

Thanks, Daniel, very interesting work! This will be interesting to try.

Context to Daniel's post:

A closed syncmer is a k-mer sampled when the first or last s-mer is the smallest in the window. We currently sample a syncmer when the middle s-mer is the smallest (open syncmer).

I have been testing closed syncmers at several times in strobealign - they never perform quite as well as open syncmers when sampling middle s-mer (we use the third s-smer when density is 1/5). This is expected because open syncmers have better spread (garanteed lower distance bound of 3 when the density is 1/5), showed by Shaw & Yu, 2021. Many traditional closed syncmers (upper panel in Daniels plot) are sampled at distance 1 from each other (i.e., not a good spread).

However, open syncmers come at the cost of not having a window guarantee, so some regions might be sparsely sampled. Daniels' plot shows that we can possibly get both a good spread and the window guarantee to ensure that all regions have enough seeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants