Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error distribution adjused with -E lower than expected #29

Open
JustinChu opened this issue Dec 1, 2017 · 1 comment
Open

Error distribution adjused with -E lower than expected #29

JustinChu opened this issue Dec 1, 2017 · 1 comment

Comments

@JustinChu
Copy link

JustinChu commented Dec 1, 2017

I simulated reads ~1million 2x150bp with rescaled error rates using the -E parameter. Using the golden bam file, I ran samtools calmd to generate the MD and NM (edit distance to the reference) tags because it doesn't contain them by default.

full set of options:

--pe 500 100 -E 0.1 -c 40 --bam -M 0 -p 1 --rng 1 -R 150

Here is a plot of the NM values for -E 0.1 (expecting a median of 150*0.1 = 15):
0 1_editdist
median is 6

And for -E 0.3 (the maximum)
0 3_editdist
median is 10 but would have expected 45

This is using the default error model. Because at the very least -E seems to rescale the error rate at higher values this isn't bug per say but it could be critically misleading. That is, if you are benchmarking a tool it could mislead you into thinking the tool can tolerate much higher error rates than it can in reality.

@zstephens
Copy link
Owner

This is very interesting. Thank you for performing this detailed analysis!

I will look into this, and see if there might be an oversight in the simulator that causes the error rates to appear lower than expected. I'll follow up on this soon.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants