Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: create tnscope mnvs #1524

Draft
wants to merge 106 commits into
base: develop
Choose a base branch
from
Draft

fix: create tnscope mnvs #1524

wants to merge 106 commits into from

Conversation

mathiasbio
Copy link
Collaborator

@mathiasbio mathiasbio commented Jan 20, 2025

Description

This PR aims to add a post-processing step to TNscope (for standard TGA workflow) to merge SNVs and InDels by their PID to MNVs to be more accurately merged with the VarDict results.

See issue: #1525

The script was taken from Sentieon here: https://github.com/Sentieon/sentieon-scripts/blob/master/merge_mnp/merge_mnp.py

Script has been slightly modified to merge variants despite filters other filters than PASS, this to allow for merging of variants with triallelic-site, and any other future soft-filters we might add. Some refactoring was done as well from the original filter to increase readability. (though it's still very messy)

The TNscope VCF is quality filtered before merging SNVs

To avoid merging of low quality variant to MNVs, the VCF is quality filtered before merging.

Merging SNVs with different filters set

There is an issue about how to consolidate the filters when merging SNVs with different filters set. Such as: germline_risk, in_normal, triallelic_site, and PASS. This was solved with this logic:

  1. If all SNVs have the same filter, the final merged MNV gets the common filter from constituent SNVs, such as PASS / in_normal etc
  2. If the SNVs have different filters, such as PASS and in_normal, the final merged MNV gets the filter: MNV_CONFLICTING_FILTERS

On top of this, a new INFO field is added called TNSCOPE_MNV_FILTERS which adds the set of all filters from the constituent SNVs and InDels that were merged, which could potentially be displayed in Scout.

Added

  • TNscope MNV merge script

Changed

  • merge SNVs into MNVs in TNscope TGA
  • change raw delivery SNV file for TGA to before any post-processing

Documentation

  • N/A
  • Updated Balsamic documentation to reflect the changes as needed for this PR.
    • balsamic_filters.rst (updated with TNscope MNV merging)

Tests

Feature Tests

Pipeline Integrity Tests

  • Report deliver (generation of the .hk file)
    • N/A
    • Verified
  • TGA T/O Workflow
    • N/A
    • Verified
  • TGA T/N Workflow
    • N/A
    • Verified
  • UMI T/O Workflow
    • N/A
    • Verified
  • UMI T/N Workflow
    • N/A
    • Verified
  • WGS T/O Workflow
    • N/A
    • Verified
  • WGS T/N Workflow
    • N/A
    • Verified
  • QC Workflow
    • N/A
    • Verified
  • PON Workflow
    • N/A
    • Verified

Clinical Genomics Stockholm

Documentation

  • Atlas documentation
    • N/A
    • Updated: [Link]
  • Web portal for Clinical Genomics
    • N/A
    • Updated: [Link]

Panel of Normal specific criteria

User Changes

  • N/A
  • This PR affects the output files or results.
    • User feedback is considered unnecessary because [Justification].
    • Affected users have been included in the development process and given a chance to provide feedback.

Infrastructure Changes

  • Stored files in Housekeeper
    • N/A
    • Updated: [Link]
  • CG (CLI and delivered/uploaded files)
    • N/A
    • Updated: [Link]
  • Servers (configuration files on Hasta)
    • N/A
    • Updated: [Link]
  • Scout interface
    • N/A
    • Updated: [Link]

Validation criteria

Validation criteria to be added to validation report PR: [LINK-TO-VALIDATION-REPORT-PR from the validations repository]

Version specific criteria

  • Text here or N/A

Important

One of the below checkboxes for validation need to be checked

  • Added version specific validation criteria to validation report
  • Changes validated in standard sections: [validation-section]
  • Validation criteria not necessary

Checklist

Important

Ensure that all checkboxes below are ticked before merging.

For Developers

  • PR Description
    • Provided a comprehensive description of the PR.
    • Linked relevant user stories or issues to the PR.
  • Documentation
    • Verified and updated documentation if necessary.
  • Validation criteria
    • Completed the validation criteria section of the template.
  • Tests
    • Described and tested the functionality addressed in the PR.
    • Ensured integration of the new code with existing workflows.
    • Confirmed that meaningful unit tests were added for the changes introduced.
    • Checked that the PR has successfully passed all relevant code smells and coverage checks.
  • Review
    • Addressed and resolved all the feedback provided during the code review process.
    • Obtained final approval from designated reviewers.

For Reviewers

  • Code
    • Code implements the intended features or fixes the reported issue.
    • Code follows the project's coding standards and style guide.
  • Documentation
    • Pipeline changes are well-documented in the CHANGELOG and relevant documentation.
  • Validation criteria
    • The author has completed the validation criteria section of the template
  • Tests
    • The author provided a description of their manual testing, including consideration of edge cases and boundary
      conditions where applicable, with satisfactory results.
  • Review
    • Confirmed that the developer has addressed all the comments during the code review.

@mathiasbio mathiasbio linked an issue Jan 20, 2025 that may be closed by this pull request
@mathiasbio mathiasbio linked an issue Jan 20, 2025 that may be closed by this pull request
3 tasks
Copy link

codecov bot commented Jan 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.50%. Comparing base (7d529e6) to head (ed2ea13).
Report is 38 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1524      +/-   ##
===========================================
+ Coverage    99.48%   99.50%   +0.02%     
===========================================
  Files           40       40              
  Lines         1932     2036     +104     
===========================================
+ Hits          1922     2026     +104     
  Misses          10       10              
Flag Coverage Δ
unittests 99.50% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mathiasbio mathiasbio mentioned this pull request Jan 30, 2025
54 tasks
@mathiasbio mathiasbio changed the base branch from develop to disable_normal_hardfilter January 30, 2025 16:01
Base automatically changed from disable_normal_hardfilter to develop January 31, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[User Story] Preprocess TNscope to create MNVs [Bug] Merging of different variants VarDict and TNscope
1 participant