Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make MRCA dict into a class #97

Merged
merged 1 commit into from
Mar 7, 2024
Merged

Make MRCA dict into a class #97

merged 1 commit into from
Mar 7, 2024

Conversation

hyanwong
Copy link
Owner

@hyanwong hyanwong commented Mar 7, 2024

Addresses #79

Summary by CodeRabbit

  • New Features
    • Enhanced the genetic inheritance graph library with a new MRCAdict class for improved handling and processing of MRCA intervals, including functionality for random position selection within MRCA regions.
  • Refactor
    • Integrated previously separate functionalities into the MRCAdict class for streamlined processing and accessibility.
  • Tests
    • Updated test suites to align with the new class structure and method calls, ensuring accuracy in genetic inheritance graph calculations.

Copy link
Contributor

coderabbitai bot commented Mar 7, 2024

Walkthrough

The GeneticInheritanceGraphLibrary underwent significant updates to enhance its functionality and usability. Notably, the handling of Most Recent Common Ancestor (MRCA) intervals has been revamped with the introduction of the MRCAdict class, which consolidates previous functionalities and adds the capability for random position selection within MRCA regions. This refinement aims to streamline genetic inheritance graph analyses and simulations, making them more intuitive and efficient.

Changes

File(s) Change Summary
GeneticInheritanceGraphLibrary/tables.py Introduced MRCAdict class for advanced MRCA interval handling, integrating previous functionalities and enhancing random position selection.
tests/gigutil.py
tests/test_tables.py
Adjusted to utilize the new MRCAdict class for random position selection, reflecting a shift in how MRCA-related functionalities are accessed and organized.
tests/test_gigutil.py Updated to import the library as gigl and use MRCAdict for calculating comparable points, changing the approach to determining comparable points in tests.

Related issues

  • github/Ideas for visualisation tools #78: The enhancements made to MRCA interval handling and the potential for improved data extraction and visualization tools could directly support the objectives outlined in this issue, particularly in developing visualization tools for GIGs and their algorithms.

🐰✨
In the world of genes and graphs, we hop,
With MRCAdict now, our work does not stop.
From intervals to points, so precise,
Our genetic maps, now thrice as nice.
Oh, how the data flows and entwines,
In the library's heart, where code combines.
🌟🐇

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@hyanwong hyanwong merged commit ab5b4b1 into main Mar 7, 2024
2 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 4f7c4cf and e0e1ac2.
Files selected for processing (4)
  • GeneticInheritanceGraphLibrary/tables.py (3 hunks)
  • tests/gigutil.py (1 hunks)
  • tests/test_gigutil.py (2 hunks)
  • tests/test_tables.py (1 hunks)
Additional comments: 2
tests/test_tables.py (1)
  • 693-693: The test method test_random_match_pos appears to be directly related to the functionality mentioned in the PR objectives and the AI-generated summary. However, the provided code does not show any changes to this method or the specific update from gig.tables.random_match_pos to mrcas.random_match_pos. This discrepancy might be due to the fact that the actual change is in another file not provided for review, or it could be an oversight in the AI-generated summary. Without direct evidence of the change in the provided code, it's challenging to assess the correctness and impact of the mentioned update.

Given the context, it's essential to ensure that the transition from using gig.tables.random_match_pos to mrcas.random_match_pos is correctly implemented across the test suite and that all relevant test cases are updated to reflect this change. Additionally, it would be beneficial to verify that the new MRCAdict class and its random_match_pos method are thoroughly tested to ensure they function as expected within the library's ecosystem.

GeneticInheritanceGraphLibrary/tables.py (1)
  • 1066-1072: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1019-1072]

The find_mrca_regions method in the Tables class has been updated to utilize the new MRCAdict class for storing and processing MRCA intervals. This method is complex, involving the traversal of genetic inheritance graphs to identify MRCA regions between two nodes. A few observations and suggestions for improvement:

  • The method is well-documented, with detailed comments explaining the algorithm's logic and purpose. This is helpful for understanding the complex operations involved.
  • The use of sortedcontainers.SortedDict and portion library for interval operations demonstrates a thoughtful approach to handling the dynamic and interval-based nature of the problem.
  • The method returns a MRCAdict instance (line 1072), which aligns with the PR's objective of integrating the new class into the library's functionality.

While the method appears to be correctly implemented, its complexity and the critical role it plays in identifying MRCA regions suggest that thorough testing is essential to ensure its reliability and correctness. It would be beneficial to verify that comprehensive tests cover various scenarios, including edge cases and complex graph structures.

Comment on lines +125 to +126
mrcas = gigl.tables.MRCAdict(
{k: {v[0]: v[1]} for k, v in enumerate(sorted(tmp, key=lambda x: x[0][0]))}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creation and initialization of MRCAdict using a dictionary comprehension from the sorted tmp list is a significant change. This approach encapsulates the MRCA interval handling within a class, aligning with the PR's objectives. However, ensure that the MRCAdict class is thoroughly tested, especially regarding its interaction with the rest of the system, to prevent any unintended side effects or regressions.

@@ -387,7 +387,7 @@ def find_comparable_points(self, tables, parent_nodes):
# Pick a single comparable location but ban recombination if one is
# inverted and the other is not inverted
tries = 0
while (pts := tables.random_match_pos(mrcas, self.rng)).opposite_orientations:
while (pts := mrcas.random_match_pos(self.rng)).opposite_orientations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modification to use mrcas.random_match_pos instead of directly calling a method on tables represents a shift towards encapsulating MRCA handling within the MRCAdict class. This change should enhance code maintainability and clarity. However, it's essential to verify that the MRCAdict class's random_match_pos method is optimized for performance and correctly handles all edge cases, especially since it now plays a critical role in the logic for selecting comparable points.

Comment on lines +604 to +631
class MRCAdict(dict):
"""
A dictionary to store the results of the MRCA finder
"""

# Convenience tuples
MRCAintervals = collections.namedtuple("MRCAintervals", "u, v")
MRCApos = collections.namedtuple("MRCApos", "u, v, opposite_orientations")

def random_match_pos(self, rng):
"""
Choose a position uniformly at random from the mrca regions and return
an equivalent position in u and v.

# Return value for random_matching_positions
ComparablePositions = collections.namedtuple(
"ComparablePositions", "u, v, opposite_orientations"
)
.. note::
It is hard to know from the MRCA structure whether intervals are
adjacent, so if this is used to locate a breakpoint, the choice of
whether a breakpoint is positioned to the left or right of the returned
position is left to the user.

:param obj rng: A numpy random number generator with a method ``integers()``
that behaves like ``np.random.default_rng().integers``.
:returns: a named tuple of ``(u_position, v_position, opposite_orientations)``
Positions are chosen at random if there are multiple equivalent positions
for u, or multiple equivalent positions for v. If one of the sequences is
inverted relative to the other then ``.opposite_orientations`` is ``True``.
:rtype: ComparablePositions
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MRCAdict class is introduced to manage MRCA intervals, replacing the previous dictionary-based approach. This class encapsulates the functionality for random position selection within MRCA regions, which is a significant enhancement. The method random_match_pos is well-documented, including a detailed explanation and a note on the handling of adjacent intervals. However, there are a few areas that could be improved for clarity and efficiency:

  • The calculation of tot_len (line 632) iterates over all intervals to sum their lengths. Consider caching this value if the dictionary is not frequently modified, to avoid recalculating it on each method call.
  • The method relies on assertions (lines 639-640) to ensure that the u and v lists are not empty. While assertions are useful for debugging, they can be disabled in production environments. It might be more robust to raise an explicit exception if this condition is not met.
  • The method's return type is documented as ComparablePositions (line 630), but the actual return type is MRCApos (lines 667, 669, 672, 674). This discrepancy should be corrected in the documentation to accurately reflect the return type.

Overall, the introduction of the MRCAdict class and the random_match_pos method is a positive change that enhances the library's functionality. However, minor improvements and corrections in the documentation and error handling could further refine this implementation.

  • Consider caching the total length of intervals to improve efficiency.
  • Replace assertions with explicit exception handling for robustness.
  • Correct the documentation to accurately reflect the return type of random_match_pos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant