-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make MRCA dict into a class #97
Conversation
WalkthroughThe GeneticInheritanceGraphLibrary underwent significant updates to enhance its functionality and usability. Notably, the handling of Most Recent Common Ancestor (MRCA) intervals has been revamped with the introduction of the Changes
Related issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Configuration used: CodeRabbit UI
Files selected for processing (4)
- GeneticInheritanceGraphLibrary/tables.py (3 hunks)
- tests/gigutil.py (1 hunks)
- tests/test_gigutil.py (2 hunks)
- tests/test_tables.py (1 hunks)
Additional comments: 2
tests/test_tables.py (1)
- 693-693: The test method
test_random_match_pos
appears to be directly related to the functionality mentioned in the PR objectives and the AI-generated summary. However, the provided code does not show any changes to this method or the specific update fromgig.tables.random_match_pos
tomrcas.random_match_pos
. This discrepancy might be due to the fact that the actual change is in another file not provided for review, or it could be an oversight in the AI-generated summary. Without direct evidence of the change in the provided code, it's challenging to assess the correctness and impact of the mentioned update.Given the context, it's essential to ensure that the transition from using
gig.tables.random_match_pos
tomrcas.random_match_pos
is correctly implemented across the test suite and that all relevant test cases are updated to reflect this change. Additionally, it would be beneficial to verify that the newMRCAdict
class and itsrandom_match_pos
method are thoroughly tested to ensure they function as expected within the library's ecosystem.GeneticInheritanceGraphLibrary/tables.py (1)
- 1066-1072: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1019-1072]
The
find_mrca_regions
method in theTables
class has been updated to utilize the newMRCAdict
class for storing and processing MRCA intervals. This method is complex, involving the traversal of genetic inheritance graphs to identify MRCA regions between two nodes. A few observations and suggestions for improvement:
- The method is well-documented, with detailed comments explaining the algorithm's logic and purpose. This is helpful for understanding the complex operations involved.
- The use of
sortedcontainers.SortedDict
andportion
library for interval operations demonstrates a thoughtful approach to handling the dynamic and interval-based nature of the problem.- The method returns a
MRCAdict
instance (line 1072), which aligns with the PR's objective of integrating the new class into the library's functionality.While the method appears to be correctly implemented, its complexity and the critical role it plays in identifying MRCA regions suggest that thorough testing is essential to ensure its reliability and correctness. It would be beneficial to verify that comprehensive tests cover various scenarios, including edge cases and complex graph structures.
mrcas = gigl.tables.MRCAdict( | ||
{k: {v[0]: v[1]} for k, v in enumerate(sorted(tmp, key=lambda x: x[0][0]))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The creation and initialization of MRCAdict
using a dictionary comprehension from the sorted tmp
list is a significant change. This approach encapsulates the MRCA interval handling within a class, aligning with the PR's objectives. However, ensure that the MRCAdict
class is thoroughly tested, especially regarding its interaction with the rest of the system, to prevent any unintended side effects or regressions.
@@ -387,7 +387,7 @@ def find_comparable_points(self, tables, parent_nodes): | |||
# Pick a single comparable location but ban recombination if one is | |||
# inverted and the other is not inverted | |||
tries = 0 | |||
while (pts := tables.random_match_pos(mrcas, self.rng)).opposite_orientations: | |||
while (pts := mrcas.random_match_pos(self.rng)).opposite_orientations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modification to use mrcas.random_match_pos
instead of directly calling a method on tables
represents a shift towards encapsulating MRCA handling within the MRCAdict
class. This change should enhance code maintainability and clarity. However, it's essential to verify that the MRCAdict
class's random_match_pos
method is optimized for performance and correctly handles all edge cases, especially since it now plays a critical role in the logic for selecting comparable points.
class MRCAdict(dict): | ||
""" | ||
A dictionary to store the results of the MRCA finder | ||
""" | ||
|
||
# Convenience tuples | ||
MRCAintervals = collections.namedtuple("MRCAintervals", "u, v") | ||
MRCApos = collections.namedtuple("MRCApos", "u, v, opposite_orientations") | ||
|
||
def random_match_pos(self, rng): | ||
""" | ||
Choose a position uniformly at random from the mrca regions and return | ||
an equivalent position in u and v. | ||
|
||
# Return value for random_matching_positions | ||
ComparablePositions = collections.namedtuple( | ||
"ComparablePositions", "u, v, opposite_orientations" | ||
) | ||
.. note:: | ||
It is hard to know from the MRCA structure whether intervals are | ||
adjacent, so if this is used to locate a breakpoint, the choice of | ||
whether a breakpoint is positioned to the left or right of the returned | ||
position is left to the user. | ||
|
||
:param obj rng: A numpy random number generator with a method ``integers()`` | ||
that behaves like ``np.random.default_rng().integers``. | ||
:returns: a named tuple of ``(u_position, v_position, opposite_orientations)`` | ||
Positions are chosen at random if there are multiple equivalent positions | ||
for u, or multiple equivalent positions for v. If one of the sequences is | ||
inverted relative to the other then ``.opposite_orientations`` is ``True``. | ||
:rtype: ComparablePositions | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MRCAdict
class is introduced to manage MRCA intervals, replacing the previous dictionary-based approach. This class encapsulates the functionality for random position selection within MRCA regions, which is a significant enhancement. The method random_match_pos
is well-documented, including a detailed explanation and a note on the handling of adjacent intervals. However, there are a few areas that could be improved for clarity and efficiency:
- The calculation of
tot_len
(line 632) iterates over all intervals to sum their lengths. Consider caching this value if the dictionary is not frequently modified, to avoid recalculating it on each method call. - The method relies on assertions (lines 639-640) to ensure that the
u
andv
lists are not empty. While assertions are useful for debugging, they can be disabled in production environments. It might be more robust to raise an explicit exception if this condition is not met. - The method's return type is documented as
ComparablePositions
(line 630), but the actual return type isMRCApos
(lines 667, 669, 672, 674). This discrepancy should be corrected in the documentation to accurately reflect the return type.
Overall, the introduction of the MRCAdict
class and the random_match_pos
method is a positive change that enhances the library's functionality. However, minor improvements and corrections in the documentation and error handling could further refine this implementation.
- Consider caching the total length of intervals to improve efficiency.
- Replace assertions with explicit exception handling for robustness.
- Correct the documentation to accurately reflect the return type of
random_match_pos
.
Addresses #79
Summary by CodeRabbit
MRCAdict
class for improved handling and processing of MRCA intervals, including functionality for random position selection within MRCA regions.MRCAdict
class for streamlined processing and accessibility.