Question about DSC_estimated_error_rates calculation #92

gevro · 2024-03-26T18:53:57Z

Hi,
I had a question about the DSC_estimated_error_rates.pdf calculation. For one example sample, for one specific trinucleotide context, I tried to reconstruct the calculation:

ACG>T = 150 -> count from results.SSC-mismatches-Pyrimidine.triprofiles.tsv
ACG>T = 15 -> count from results.SSC-mismatches-Purine.triprofiles.tsv
ACG trinucleotide count = 7674995 -> count from results.trint_counts_and_ratio2genome.tsv tri_bg column, which is the number of times ACG/CGT base pairs are observed, which is also equivalent to the number of times ACG is observed in single-strand consensus sequences.

ACG error rate I see in DSC_estimated_error_rates.pdf is ~1.5e-10.

I infer this was calculated as: (150/[7674995/2])*(15/[7674995/2]),
which matches this code line in nanoseq_results_plotter.R:
y = (tri_obs_pyr / (tri_bg[tris] / 2)) * (tri_obs_pur / (tri_bg[tris] / 2));

However, why was the ACG trinucleotide count of 7674995 divided by 2 for both the pyrimidine and purine parts of the error rate calculation?

I would have thought that the calculation would be:
[probability of having an ACG>T change in one strand = 150 / 7674995] X [probability of having a CGT>A change in the other strand = 15 / 7674995]

= (150/7674995)*(15/7674995) = ~3.8e-11

This is lower by a factor of 4 from the DSC_estimated_error_rates.pdf estimate.

To visually illustrate:

The difference in my suggestion is I am considering the context count from results.trint_counts_and_ratio2genome.tsv tri_bg column as the denominator, rather than half of that, since the number in the tri_bg column is equivalent to the number of times the ACG trinucleotide context is observed in both strands. And I'm assuming the counts in SSC-mismatches-Pyrimidine.triprofiles.tsv contains the number of times ACG>T is observed in both strands (i.e. sum of ACG>T ssDNA calls in both strands), and likewise for CGT>A in SSC-mismatches-Purine.triprofiles.tsv.

It's a bit tricky to think through, so perhaps you can catch an error in my logic.

Thanks!

gevro · 2024-03-26T21:55:10Z

Note: I updated my original post.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about DSC_estimated_error_rates calculation #92

Question about DSC_estimated_error_rates calculation #92

gevro commented Mar 26, 2024 •

edited

Loading

gevro commented Mar 26, 2024

Question about DSC_estimated_error_rates calculation #92

Question about DSC_estimated_error_rates calculation #92

Comments

gevro commented Mar 26, 2024 • edited Loading

gevro commented Mar 26, 2024

gevro commented Mar 26, 2024 •

edited

Loading