Error distribution adjused with -E lower than expected #29

JustinChu · 2017-12-01T21:21:05Z

I simulated reads ~1million 2x150bp with rescaled error rates using the -E parameter. Using the golden bam file, I ran samtools calmd to generate the MD and NM (edit distance to the reference) tags because it doesn't contain them by default.

full set of options:

--pe 500 100 -E 0.1 -c 40 --bam -M 0 -p 1 --rng 1 -R 150

Here is a plot of the NM values for -E 0.1 (expecting a median of 150*0.1 = 15):

median is 6

And for -E 0.3 (the maximum)

median is 10 but would have expected 45

This is using the default error model. Because at the very least -E seems to rescale the error rate at higher values this isn't bug per say but it could be critically misleading. That is, if you are benchmarking a tool it could mislead you into thinking the tool can tolerate much higher error rates than it can in reality.

The text was updated successfully, but these errors were encountered:

zstephens · 2017-12-03T17:59:57Z

This is very interesting. Thank you for performing this detailed analysis!

I will look into this, and see if there might be an oversight in the simulator that causes the error rates to appear lower than expected. I'll follow up on this soon.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error distribution adjused with -E lower than expected #29

Error distribution adjused with -E lower than expected #29

JustinChu commented Dec 1, 2017 •

edited

Loading

zstephens commented Dec 3, 2017

Error distribution adjused with -E lower than expected #29

Error distribution adjused with -E lower than expected #29

Comments

JustinChu commented Dec 1, 2017 • edited Loading

zstephens commented Dec 3, 2017

JustinChu commented Dec 1, 2017 •

edited

Loading