Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parameter '-to' not working as expected #64

Open
ghost opened this issue Oct 23, 2019 · 2 comments
Open

parameter '-to' not working as expected #64

ghost opened this issue Oct 23, 2019 · 2 comments

Comments

@ghost
Copy link

ghost commented Oct 23, 2019

Hello,

I am trying to use parameter 'to' in order to limit number of variants in the off-target region as per mentioned in your doc. However, my several trials with different data sets have indicated that use of '-to' parameter yields more off-target variants than those obtained without '-to'. For example, see following results that i got for a single chromosome simulation.

Reference: chr17.fa
Read length (-R): 75
Paired end fragment size and deviation (-Pe): 200 20
Coverage (-C) : 100
Number of chromosomes in target bed file:1
Number of intervals in target bed file: 24

image

As per the description on the github page, -to = 0 should result 0 variants out of the targeted region or very few if generated at all.
From the 3rd row, it is clear that when -to =0, number of variants generated has actually increased !!!! if we subtract in-target variants from the total number of variants (last column in the table), we see that with -to=0 the number of off-target variants is larger than the number of off-target variants obtained without ‘to’ parameter . Which is contradictory to what the github page says.

I saw a similar issue reported here in September 2017. Though it says, the issue is fixed; my results indicate otherwise. Am i missing something? Or the issue is still to be fixed? I am using the latest repository.

Regards

@zstephens
Copy link
Owner

Greetings!

To confirm, the variants you're observing are those in the golden.vcf produced by the simulation, correct? The reads (and golden.bam, if you're outputting one) are still properly restricted to the targeted regions as expected?

NEAT proceeds along the entire reference sequence window-by-window, and if it finds a window that includes a targeted region as specified in the input BED it will introduce variants into that window and then sample reads. The read sampling is heavily biased to occur primarily in coordinates from the BED file, but the randomly generated variants are not restricted in this fashion. Would you be able to confirm that the variants outside the targeted regions are occurring nearby the targeted regions?

If this is in fact the case, I'm open to adjusting the behavior such that the output VCF only contains variants within the target regions (perhaps configurable via another option).

Thanks!

@ghost
Copy link
Author

ghost commented Oct 25, 2019

Yes, i am checking variants from the golden.vcf. Out of 100 off-target variants, 5 are near target regions (around 10 to 60 bases away from either start or stop). For those variants, few reads (3-7) are seen. For the remaining 95 variants, no reads are observed in the pile up. So we can say that the reads are properly restricted to the targeted regions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant