Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scala implementations for some SamRecord methods #809

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nh13
Copy link
Member

@nh13 nh13 commented Mar 10, 2022

  • refPosAtReadPos
  • readPosAtRefPos

@nh13 nh13 requested a review from tfenne March 10, 2022 00:09
@@ -236,9 +236,83 @@ trait SamRecord {
// transient attributes
@inline final val transientAttrs: TransientAttrs = new TransientAttrs(this)

// TODO long-term: replace these two methods with methods on [[Cigar]] to save creating alignment blocks in memory
@inline final def refPosAtReadPos(pos: Int) = getReferencePositionAtReadPosition(pos)
@inline final def readPosAtRefPos(pos: Int, returnLastBaseIfDeleted: Boolean) = getReadPositionAtReferencePosition(pos, returnLastBaseIfDeleted)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are breaking changes to the API. I like returning Option[Int] better, so I can return None instead of 0

* @param returnLastBaseIfInserted if the reference is an insertion, true to returned the previous reference base,
* false to return None
* */
@inline final def refPosAtReadPos(pos: Int, returnLastBaseIfInserted: Boolean = false): Option[Int] = if (pos <= 0) None else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returnLastBaseIfInserted is a new parameter, which htsjdk doesn't even have!

@@ -289,4 +289,48 @@ class SamRecordTest extends UnitSpec with OptionValues {
rec1.matesOverlap shouldBe None // Mate's start is not enclosed by rec, and mate's end cannot be determined
rec2.matesOverlap.value shouldBe true // Mate's start is enclosed by rec, regardless of where mate end is
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests copied from htsjdk, with a few added

src/main/scala/com/fulcrumgenomics/bam/api/SamRecord.scala Outdated Show resolved Hide resolved
src/main/scala/com/fulcrumgenomics/bam/api/SamRecord.scala Outdated Show resolved Hide resolved
* @param pos the 1-based read position to query
* @param returnLastBaseIfInserted if the reference is an insertion, true to returned the previous reference base,
* false to return None
* */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to think carefully about, and document here, what the behaviour should be if:
a) The read position given is > read.length (probably a require() to throw an exception?)
b) The read position given is within soft-clipping
c) Hard-clipping is present

For (b) I would generally say to return None, and I think that's probably still the best route, but it does mean that if returnLastBaseIfInserted is false, it'll be hard to distinguish between an insertion and clipping.

For (c) I suspect we just want to ignore hard clipping? But make sure we do the right thing if it's there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly you return None if pos < 0 and I think that should probably throw an exception.

src/main/scala/com/fulcrumgenomics/bam/api/SamRecord.scala Outdated Show resolved Hide resolved
* @param returnLastBaseIfDeleted if the reference is a deletion, true to returned the previous read base, false to
* return None
* */
@inline final def readPosAtRefPos(pos: Int, returnLastBaseIfDeleted: Boolean = false): Option[Int] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here and in the other method we should rename the parameter to returnPriorBaseIf[Deleted] as I think prior is clearer that we're talking about the base prior to the one requested, vs. last could imply the last base in the read.

@inline final def readPosAtRefPos(pos: Int, returnLastBaseIfDeleted: Boolean = false): Option[Int] = {
if (this.unmapped || pos < this.start || this.end < pos) None
else { // overlaps
var readPos = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not 1 since we're returning 1-based?

}
false
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation has the same problem with clipping. If I have a read that has rec.start = 100 and cigar 10S90M and I request readPosAtRefPos(100) I think this will erroneously return 1 and not 11.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wont since we check pos against CoordMath.getEnd(refPos, elem.lengthOnTarget). Since a soft-clip has 0 for elem.lengthOnTarget, then CoordMath.getEnd will return refPos-1, which is one before pos

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests

* refPosAtReadPos
* readPosAtRefPos
@nh13 nh13 force-pushed the sam-record-native-pos-at-methods branch from 1c7826f to 6908161 Compare March 26, 2022 06:53
@nh13 nh13 force-pushed the sam-record-native-pos-at-methods branch from 6908161 to 5232ac9 Compare March 26, 2022 06:55
@nh13
Copy link
Member Author

nh13 commented Mar 26, 2022

@tfenne back to you

@codecov-commenter
Copy link

codecov-commenter commented Mar 26, 2022

Codecov Report

Merging #809 (5232ac9) into main (bb36b51) will decrease coverage by 0.01%.
The diff coverage is 95.31%.

@@            Coverage Diff             @@
##             main     #809      +/-   ##
==========================================
- Coverage   95.58%   95.57%   -0.02%     
==========================================
  Files         122      122              
  Lines        7068     7114      +46     
  Branches      511      503       -8     
==========================================
+ Hits         6756     6799      +43     
- Misses        312      315       +3     
Flag Coverage Δ
unittests 95.57% <95.31%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ala/com/fulcrumgenomics/bam/SamRecordClipper.scala 96.36% <71.42%> (-1.19%) ⬇️
...com/fulcrumgenomics/bam/pileup/PileupBuilder.scala 98.03% <83.33%> (-1.97%) ⬇️
.../scala/com/fulcrumgenomics/bam/api/SamRecord.scala 95.18% <100.00%> (+1.66%) ⬆️
.../fulcrumgenomics/umi/ReviewConsensusVariants.scala 97.54% <100.00%> (-0.04%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb36b51...5232ac9. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants