Skip to content

Commit

Permalink
Update 4_assembly_qc.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ldutoit authored Nov 19, 2024
1 parent 171a013 commit 7451742
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/pages/4_assembly_qc.md
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,7 @@ If you tried to run that command with the output straight to standard out (_i.e.

This is more manageable, and you can even kind of see the histogram forming from the count values. There's a lot of _k_-mers that are present at only one copy (or otherwise very low copy) in the read set: these are usually sequencing errors, because there's a lot of these _k_-mers present at low copy. Because the sequence isn't actually real (_i.e._, it isn't actually in the genome and isn't actually serving as sequencing template), these _k_-mers stay at low copy. After these error _k_-mers, there's a dip in the histogram until about the 24–28 copy range. This peak is the coverage of the actual _k_-mers coming from the genome that you sequenced, thus it corresponds to having coverage of ~26X in this read set. We only have one peak here because this is a haploid dataset, but if your dataset is diploid then expect two peaks with the first peak varying in height depending on heterozygosity of your sample.

"What if I want a pretty graph instead of imagining it?" Good news&mdash;there's <del>an app</del> a program for that. GenomeScope is a straightforward program with an online web page where you can just drop in your meryl histogram file and it will draw the histogram for you as well as use the GenomeScope model to predict some genome characteristics of your data, given the expected ploidy. Let's try it out! Download the `read-db.hist` file and throw it into the GenomeScope website: http://qb.cshl.edu/genomescope/genomescope2.0/ and adjust the parameters accordingly.
"What if I want a pretty graph instead of imagining it?" Good news&mdash;there's <del>an app</del> a program for that. GenomeScope is a straightforward program with an online web page where you can just drop in your meryl histogram file and it will draw the histogram for you as well as use the GenomeScope model to predict some genome characteristics of your data, given the expected ploidy. Let's try it out! Download the `read-db.hist` file and throw it into the GenomeScope website: http://genomescope.org/genomescope2.0/ and adjust the parameters accordingly.

??? tip "Can I use GenomeScope to QC my raw data before assembly?"

Expand Down

0 comments on commit 7451742

Please sign in to comment.