-
-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool to consensus call overlapping bases in paired end reads. Adds direct support in the consensus calling tools too. #805
Conversation
Codecov Report
@@ Coverage Diff @@
## main #805 +/- ##
==========================================
+ Coverage 95.58% 95.67% +0.08%
==========================================
Files 122 125 +3
Lines 7068 7255 +187
Branches 511 521 +10
==========================================
+ Hits 6756 6941 +185
- Misses 312 314 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
8588fcd
to
f6e9f09
Compare
5ebd61f
to
31d7fe5
Compare
src/main/scala/com/fulcrumgenomics/personal/nhomer/ReadAndRefPosIterator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/personal/nhomer/ReadAndRefPosIterator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/personal/nhomer/OverlappingBasesConsensusCaller.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/personal/nhomer/OverlappingBasesConsensusCaller.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/personal/nhomer/OverlappingBasesConsensusCaller.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/personal/nhomer/CorrectOverlappingBases.scala
Outdated
Show resolved
Hide resolved
dd02a62
to
5415287
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/main/scala/com/fulcrumgenomics/bam/CallOverlappingConsensusBases.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/bam/CallOverlappingConsensusBases.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/bam/CallOverlappingConsensusBases.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/bam/CallOverlappingConsensusBases.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/bam/CallOverlappingConsensusBases.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/bam/ReadAndRefPosIterator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/bam/ReadAndRefPosIterator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/umi/CallDuplexConsensusReads.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/umi/CallMolecularConsensusReads.scala
Outdated
Show resolved
Hide resolved
Co-authored-by: Tim Fennell <[email protected]>
src/main/scala/com/fulcrumgenomics/bam/CallOverlappingConsensusBases.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/bam/CallOverlappingConsensusBases.scala
Show resolved
Hide resolved
Co-authored-by: Tim Fennell <[email protected]>
Co-authored-by: Tim Fennell <[email protected]>
Co-authored-by: Tim Fennell <[email protected]>
@tfenne back to you.
We don't have an explicit cap in the main consensus callers, except to make sure it doesn't go beyond 93 or whatever the htsjdk maximum value is. The other callers have an effective cap due to the pre and post labeling error rates, which we aren't using in this simplified model. I could switch over to ConsensusCaller and have the pre-error rate be effectively zero, and the post-error rate by 40 or whatever the default is, but is it worth it?
See this comment in SamRecords that you wrote on 5/19/2017:
I took that to heart and started an PR for replacing those methods in #809, but didn't end up needing them here, since it was rather than parsing and iterating through the cigar for each overlapped read/reference position, it was more efficient to create iterators (one for a read, one for a mate) that kept track of where it was in the cigar for each read and returned the desired overlapping read/references positions. Yes it's complex, but this booking keeping is hard! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Please merge when done.
This PR adds a tool
CallOverlappingConsensusBases
that will examine read pairs who overlap and consensus call those overlapping bases in-place. This functionality is extracted into theOverlappingBasesConsensusCaller
class so that both theCallMolecularConsensusReads
andCallDuplexConsensusReads
tools can also support this (opt-in).For a given read pair, it will iterate through the the mapped bases that overlap between the read and mate in a read pair. If the read and mate agree at a given reference position, then read and mate base will not change and the base quality returned is controlled by
maxQualOnAgreement
. If they disagree at a given reference position, then the base and quality returned is controlled byonlyMaskDisagreements
. SeeOverlappingBasesConsensusCaller
for more details on the two arguments.To facilitate all this, a new iterator implementation
ReadAndRefPosIterator
has been built. For a given mapped read, this iterates over each mapped read base (i.e. has a corresponding reference base), with each value being the 1-based position in the read and reference respectively. Insertions and deletions will not be returned. For example, an insertion consumes a read base, but has no corresponding reference base. A deletion consumes a reference base, but has no corresponding read base. This class is extended in the newMateOverlappingReadAndRefPosIterator
, which can iterate by reference position through just the overlapping bases in a read and its mate. This iteration is how the consensus caller for overlapping bases lines up bases to examine between a read and its mate.