Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long read quality control standards #135

Open
justinjj24 opened this issue Nov 21, 2024 · 1 comment
Open

Long read quality control standards #135

justinjj24 opened this issue Nov 21, 2024 · 1 comment

Comments

@justinjj24
Copy link
Member

justinjj24 commented Nov 21, 2024

Quality control metric for long read WGS:

N50 of aligned reads

The most common QC approach for long reads is a dotplot or contour plot of average base accuracy (or mapped accuracy, if available) vs read length.

NanoPlot can be used for this:

https://github.com/wdecoster/NanoPlot

PycoQC
NanoGalaxy

PacBio use the IsoSeq pipeline

@justinjj24
Copy link
Member Author

Long-read sequencing provides longer sequences but often comes with higher error rates and lower throughput. QC metrics for long-read technologies include:

  • Read length distribution: Ensures that the long reads are sufficiently long to span structural variants or complex genomic regions.
  • Accuracy metrics: To evaluate base call errors, especially for technologies with higher raw error rates.
  • Alignment quality: Mapping long reads against a reference genome, evaluating discrepancies in alignments or errors like base substitutions.
  • Coverage uniformity: Ensures even distribution of reads across the genome.
    Tools like NanoPlot and PycoQC are specifically designed for long-read QC .

Both sequencil for various genomic research applications, with QC being crucial to ensure high-quality, reproducible, and comparable data, particularly in large-scale or clinical projects.

Sources
[1] FastQC: A tool for quality control of high throughput sequence data.
[2] Picard: A toolkit for high-throughput sequencing data analysis.
[3] NanoPlot: A tool for visualizing long read data from nanopore sequencing.
[4] PycoQC: A quality control tool for long-read sequencing from PacBio and Oxford Nanopore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant