Closed syncmers with lower density #429

Daniel-Liu-c0deb0t · 2024-06-15T03:50:48Z

Hey it's me again :) I briefly corresponded with @ksahlin on a refined version of closed syncmers with 25-30% lower density for realistic values of k. This might be useful for strobealign.

My code for generating these asymptotically optimal density closed syncmers is here: https://github.com/Daniel-Liu-c0deb0t/dlb-kmer-sampling/blob/main/src/lib.rs#L30 and I can explain the algorithm in more detail if needed.

ksahlin · 2024-06-15T09:46:52Z

Thanks, Daniel, very interesting work! This will be interesting to try.

Context to Daniel's post:

A closed syncmer is a k-mer sampled when the first or last s-mer is the smallest in the window. We currently sample a syncmer when the middle s-mer is the smallest (open syncmer).

I have been testing closed syncmers at several times in strobealign - they never perform quite as well as open syncmers when sampling middle s-mer (we use the third s-smer when density is 1/5). This is expected because open syncmers have better spread (garanteed lower distance bound of 3 when the density is 1/5), showed by Shaw & Yu, 2021. Many traditional closed syncmers (upper panel in Daniels plot) are sampled at distance 1 from each other (i.e., not a good spread).

However, open syncmers come at the cost of not having a window guarantee, so some regions might be sparsely sampled. Daniels' plot shows that we can possibly get both a good spread and the window guarantee to ensure that all regions have enough seeds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closed syncmers with lower density #429

Closed syncmers with lower density #429

Daniel-Liu-c0deb0t commented Jun 15, 2024

ksahlin commented Jun 15, 2024

Closed syncmers with lower density #429

Closed syncmers with lower density #429

Comments

Daniel-Liu-c0deb0t commented Jun 15, 2024

ksahlin commented Jun 15, 2024