-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing MC counts vs. MS intervals in VCF rows with multiple motifs #26
Comments
Yes, agreed. We had many discussions about the best way to define these fields. We eventually decided that |
I think that would help.. what I'm trying to get to is a table where each row is a repeat locus, and which I could merge across samples to then look for outliers. Even though MS provides the most detailed representation of each haplotype, it's difficult to merge across samples if one sample has lets say (haploid) genotype |
Yes, agreed, it's difficult to merge on the
Note that since |
Yes. I use 2 kinds of tables in my analyses - a locus table:
and an allele table:
For single-sample VCFs, the tables would have other relevant columns like SD and AP. |
Running TRGT v0.8 on a custom catalog that includes loci with several adjacent motifs, I found this row in the output vcf throws off my parsing script:
In that row,
IIUC, the MS field represents the 1st haplotype as
AA|AAAAT.AAAAT.AAAAT.AAAAT|A
.but the reuse of the 0th node in the path (ie. (A)n) at the end of the sequence makes parsing more complicated.
Also, this means I can't rely on the MC value to understand the haplotype sequence since MC would be the same if TRGT had instead partitioned the sequence as
AAA|AAAAT.AAAAT.AAAAT.AAAAT
I think it would be more intuitive if the output was either
or, if the STRUC is kept the same, then dropping the last node from the path
Not sure how common or significant this issue is, but it seems to make parsing more challenging compared to ExpansionHunter outputs. Ideally, there would be a simple way to get the copy numbers and confidence intervals for each motif in the STRUC.
The text was updated successfully, but these errors were encountered: