You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi I was testing your tool on a sample and wanted to check the copy number in a specific region.
if I understood correctly the MC tag in the info field corresponds to the copy number of the motif in the region :
in this region like many other regions for exmple, the MC column is pointing out the the copy number of both allels are equal and is 75, but as it's clear to see in the seuqence, the copy number should be around 29 if we include all the occurences of A in this sequence.
The 75 is however the length of the region so I was wondering if there is an explaination for this output.
Regards
The text was updated successfully, but these errors were encountered:
Lionward
changed the title
Copy number ouput in the MC column
Copy number ouput in the MC Tag in Info column
Dec 1, 2023
Thank you for the question! We actually had many discussions about how to properly handle cases like this.
The current version of TRGT takes a very simplistic approach and assumes that the entire region must be matched to the specified motif. Because of this, it reported the motif count of 75 with a low repeat purity score of 0.386667 (AP field). Note that 75 * 0.386667 = 29. This seems reasonable to do for relatively pure repeats like the known pathogenic repeats, but can definitely be misleading for repeats that include some flanking sequence that does not match the specified motif at all, like your example.
We are re-working the motif-counting algorithm in TRGT to report more sensible counts. For example, this new algorithm should recognize that only the stretch of As in the middle of your sequence is the A homopolymer and report its size in the MC field. Does this sound reasonable? If yes, would you be interested in testing out the pre-release version of TRGT where this algorithm is implemented? If yes, the binaries are attached. It would be very useful to hear your feedback.
Hi I was testing your tool on a sample and wanted to check the copy number in a specific region.
if I understood correctly the MC tag in the info field corresponds to the copy number of the motif in the region :
in this region like many other regions for exmple, the MC column is pointing out the the copy number of both allels are equal and is 75, but as it's clear to see in the seuqence, the copy number should be around 29 if we include all the occurences of A in this sequence.
The 75 is however the length of the region so I was wondering if there is an explaination for this output.
Regards
The text was updated successfully, but these errors were encountered: