Optimized parameters for multi-context seeds #423

marcelm · 2024-04-30T14:32:37Z

Branch: mcs-optimized-parameters (commit 7fe07b4).

The same command as in #407 was used:

./search.py -c ${commit} -x --accuracy-slack 0.1 --mapping-rate-slack 1 -r ${read_length}

Suggested changes

Using parameters from commit 4c10938 as baseline.

Readl.	Before	Optimized	maponly SE	maponly PE	extalign SE	extalign PE
50	(16, 12, -2, 0)	(16, 12, -2, -1)	+0.1156	+0.1477	+0.0559	+0.0586
75	(20, 16, -3, -1)	(22, 18, -2, -1)	+0.3205	+0.2597	+0.0631	+0.1046
100	(16, 12, 1, 3)	(20, 16, 0, 3)	+0.1307	+0.1723	+0.0171	+0.0694
150	(20, 16, 2, 5)	(23, 19, 3, 6)	+0.1889	+0.1499	+0.0525	+0.0486
200	(24, 20, 4, 12)	(26, 22, 4, 13)	+0.0621	+0.0274	+0.0100	+0.0108
300	(24, 20, 5, 13)	(26, 22, 6, 14)	+0.1496	+0.0557	+0.0438	+0.0169
500	(25, 19, 7, 13)	(27, 21, 6, 14)	+0.0861	+0.0231	+0.0219	+0.0012

Old table using parameters from v0.12 as baseline

Readl.	Before	Optimized	maponly SE	maponly PE	extalign SE	extalign PE
50	(18, 14, -2, 1)	(16, 12, -2, -1)	+0.1477	+0.3609	+0.2171	+0.1236
75	(20, 16, -3, 2)	(22, 18, -2, -1)	+0.3006	+0.2672	+0.0232	+0.1401
100	(20, 16, -2, 2)	(20, 16, 0, 3)	+0.6108	+0.3681	+0.1630	+0.1153
100		(23, 19, 0, 1)	+0.4422	+0.3745	+0.0729	+0.1442
150	(20, 16, 1, 7)	(23, 19, 4, 7)	+0.1915	+0.2323	+0.0257	+0.0748
150		(22, 18, 3, 7)	+0.2030	+0.1779	+0.0536	+0.0679
200	(22, 18, 2, 12)	(24, 20, 4, 12)	+0.1054	+0.1024	+0.0550	+0.0386
300	(22, 18, 2, 12)	(24, 20, 6, 13)	+0.2251	+0.1039	+0.0769	+0.0368
500	(23, 17, 2, 12)	(25, 19, 7, 13)	+0.2818	+0.1646	+0.0596	+0.0119

The text was updated successfully, but these errors were encountered:

marcelm · 2024-05-17T09:10:47Z

I’ve now started to run the same optimization as above on the new "Sim5" dataset (the above was done on the "Sim3" dataset, which has less variation).

Here are the results for the read lengths that have finished.

Readl.	Before	Optimized	maponly SE	maponly PE	extalign SE	extalign PE
50	(16, 12, -2, 0)	(16, 12, -2, -1)	+0.2479	+0.2580	+0.1281	+0.1493
75	(20, 16, -3, -1)	(19, 15, -1, -1)	+0.2116	+0.0701	+0.0817	+0.0682
100	(16, 12, 1, 3)	(17, 13, 1, 4)	+0.0891	+0.0328	-0.0007	+0.0070
150	(20, 16, 2, 5)	(20, 16, 2, 6)	+0.0781	+0.0005	+0.0301	+0.0040
200	(24, 20, 4, 12)	(23, 19, 3, 13)	+0.0085	+0.004	+0.0351	+0.0049
300	(24, 20, 5, 13)	(25, 21, 3, 14)	+0.1113	+0.0232	+0.0224	+0.0204
500	(25, 19, 7, 13)	(27, 21, 5, 14)	+0.1121	+0.0441	+0.0500	+0.0065

It’s good to have this data now, but the picture becomes less clear. Except for read length 50, the best settings for Sim5 are quite different from the ones for Sim3.

Edit: Table completed for read lengths 200-500.

ksahlin · 2024-05-17T13:48:49Z

Yes, we haven't defined a precise objective yet - only that we want 'as good as possible in both scenarios'.

Before that; I am surprised by the relatively small average accuracy gain for maponly SEand maponly PE in both the tables above. I have attached accuracy plots for the genomes, SE (hg38) and PE (hg38) data from my benchmark done recently after fixing the mcs implementation.

My benchmark was done with these parameters for mcs, and strobealign_v012_opt in the plots are the same parameters you use as baseline for our current seeds (4c10938). Do the parameters I used for mcs show up in your optimisation script? And is so, what are the average gains?

Example 1: On the genomes data (SIM3 error rate I think), both drosophila and CHM13 datasets improve by more than 1pp in maponly PE for 50nt read length, and maize and rye around 0.2-0.5pp. Based on this I would estimate an average of about 0.7pp in my experiments.
Example2: On the PE (hg38) data, there is a 2.5% increase in maponly PE for 50nt read length on SIM4, which is close to SIM5. Sure that its a different genome to CHM13, but still noteworthy.

accuracy_plot_PE.pdf

accuracy_plot_SE.pdf

accuracy_plot_genomes.pdf

ksahlin · 2024-05-17T13:59:58Z

Oh, is the Before column referring to strobealign with mcs but with parameters from (4c10938) ? Then I misunderstood.

Regardless, It would still be in interesting if these parameters are ever visited.

marcelm · 2024-05-17T14:36:44Z

Oh, is the Before column referring to strobealign with mcs but with parameters from (4c10938) ? Then I misunderstood.

Yes, I essentially re-did the optimization for mcs using SIM5 as if we had never done an optimization using SIM3. But I guess both ways are valid? Anyway, I’ll do it the way you thought so you can compare.

Do the parameters I used for mcs show up in your optimisation script? And is so, what are the
average gains?

I’ll report the numbers relative to d9d5aaf as soon as I have them.

ksahlin · 2024-05-17T14:47:37Z

Anyway, I’ll do it the way you thought so you can compare.

For some reason I thought the Before column was strobealign-v0.12.0-opt. I don't know why I misunderstood that though. Since comparing relative improvement to mcs (not to v0.12.0), which you are doing, is more relevant in this issue. I will anyway get the comparison to v0.12.0 in my benchmarks.

I’ll report the numbers relative to d9d5aaf as soon as I have them.

Okay nice - I guess this is more from my curiosity as it will signal how much better/worse the black lines in the plots I attached above will get.

marcelm · 2024-05-20T10:51:58Z

I have filled in the table above. Here is the same table but with numbers relative to d9d5aaf.

Readl.	Before	Optimized	maponly SE	maponly PE	extalign SE	extalign PE
50	(17, 13, -2, 0)	(16, 12, -2, -1)	+0.537	+0.527	+0.2882	+0.2225
75	(20, 16, -3, -1)	(19, 15, -1, -1)	+0.2116	+0.0701	+0.0817	+0.0682
100	(18, 14, 1, 3)	(17, 13, 1, 4)	+0.1218	+0.0639	+0.0868	+0.0332
150	(22, 18, 3, 5)	(20, 16, 2, 6)	+0.2318	+0.1071	+0.1695	+0.0003
200	(24, 20, 4, 12)	(23, 19, 3, 13)	+0.0085	+0.0042	+0.0351	+0.0049
300	(24, 20, 5, 13)	(25, 21, 3, 14)	+0.1113	+0.0232	+0.0224	+0.0204
500	(25, 19, 7, 13)	(27, 21, 5, 14)	+0.1121	+0.0441	+0.0500	+0.0065

ksahlin added a commit that referenced this issue May 20, 2024

Updated parameters according to optimization in issue #423

bb2c603

marcelm pushed a commit that referenced this issue May 22, 2024

Updated parameters according to optimization in issue #423

c8b52fa

marcelm pushed a commit that referenced this issue May 22, 2024

Updated parameters according to optimization in issue #423

ef13d8b

marcelm pushed a commit that referenced this issue May 23, 2024

Updated parameters according to optimization in issue #423

8c069ee

marcelm pushed a commit that referenced this issue May 23, 2024

Updated parameters according to optimization in issue #423

caea5c2

marcelm mentioned this issue Sep 27, 2024

Multi-context seeds plus fixes and optimized parameters #426

Open

8 tasks

marcelm added a commit that referenced this issue Oct 1, 2024

Use optimized parameters according to optimization in issue #423

250171f

marcelm added a commit that referenced this issue Oct 2, 2024

Use optimized parameters according to optimization in issue #423

991540a

marcelm added a commit that referenced this issue Oct 3, 2024

Use optimized parameters according to optimization in issue #423

39fa990

marcelm added a commit that referenced this issue Oct 3, 2024

Use optimized parameters according to optimization in issue #423

5be2710

marcelm added a commit that referenced this issue Oct 7, 2024

Use optimized parameters according to optimization in issue #423

3f7f1a3

marcelm added a commit that referenced this issue Oct 9, 2024

Use optimized parameters according to optimization in issue #423

03921cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized parameters for multi-context seeds #423

Optimized parameters for multi-context seeds #423

marcelm commented Apr 30, 2024 •

edited

Loading

marcelm commented May 17, 2024 •

edited

Loading

ksahlin commented May 17, 2024 •

edited

Loading

ksahlin commented May 17, 2024 •

edited

Loading

marcelm commented May 17, 2024

ksahlin commented May 17, 2024 •

edited

Loading

marcelm commented May 20, 2024

Optimized parameters for multi-context seeds #423

Optimized parameters for multi-context seeds #423

Comments

marcelm commented Apr 30, 2024 • edited Loading

Suggested changes

marcelm commented May 17, 2024 • edited Loading

ksahlin commented May 17, 2024 • edited Loading

ksahlin commented May 17, 2024 • edited Loading

marcelm commented May 17, 2024

ksahlin commented May 17, 2024 • edited Loading

marcelm commented May 20, 2024

marcelm commented Apr 30, 2024 •

edited

Loading

marcelm commented May 17, 2024 •

edited

Loading

ksahlin commented May 17, 2024 •

edited

Loading

ksahlin commented May 17, 2024 •

edited

Loading

ksahlin commented May 17, 2024 •

edited

Loading