-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy up the simulator classes #75
Conversation
Warning Rate Limit Exceeded@hyanwong has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 24 minutes and 26 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. WalkthroughThese updates involve renaming a class for clarity and refining its functionality, particularly in handling genetic simulation processes. A method has been added to improve the identification of comparable points, and the logic for selecting breakpoints has been refined. Additionally, the testing suite has been expanded with a new test class and updated tests to ensure compatibility and accuracy against known standards. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 11
Configuration used: CodeRabbit UI
Files selected for processing (2)
- tests/gigutil.py (4 hunks)
- tests/test_gigutil.py (4 hunks)
Additional comments: 3
tests/test_gigutil.py (3)
- 2-2: Ensure
tskit
is used in the file since it's imported. If not, consider removing the import to clean up the code.- 98-102: The class
DTWF_one_break_no_rec_inversions_test
inherits fromDTWF_one_break_no_rec_inversions_slow_sim
but does not override or extend any of its methods directly. Verify that inheritance is the intended design pattern here, as it might be more appropriate to use composition or a different relationship if the test class is not meant to extend the simulator's functionality.- 155-155: The use of
assert_equals
for comparing tables is appropriate, but ensure that theignore_provenance
flag's usage aligns with the test's objectives, as it might overlook differences in metadata that could be relevant.
def find_comparable_points(self, gig, parent_nodes): | ||
""" """ | ||
mrcas = gig.find_mrca_regions(*parent_nodes) | ||
# Create a new mrca dict with arbitrary keys but where each value is a single | ||
# interval with the appropriate matching coords in u and v. Items in the dict | ||
# are sorted by the left coordinate of the mrca. Keys can be arbitrary because | ||
# we don't use the identity of the MRCA node to determine breakpoint dynamics. | ||
tmp = [] | ||
for mrca_regions in mrcas.values(): | ||
for region, equivalents in mrca_regions.items(): | ||
tmp.append((region, equivalents)) | ||
comparable_pts = gig.random_matching_positions( | ||
{k: {v[0]: v[1]} for k, v in enumerate(sorted(tmp, key=lambda x: x[0][0]))}, | ||
self.random, | ||
) | ||
return comparable_pts # Don't bother with inversions: testing doesn't use them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method find_comparable_points
lacks a detailed docstring explaining its purpose, parameters, and return value. Adding a comprehensive docstring would improve code readability and maintainability.
{k: {v[0]: v[1]} for k, v in enumerate(sorted(tmp, key=lambda x: x[0][0]))}, | ||
self.random, | ||
) | ||
return comparable_pts # Don't bother with inversions: testing doesn't use them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "Don't bother with inversions: testing doesn't use them" might need clarification or removal if it's not relevant to the method's functionality or if it could confuse future maintainers.
@@ -106,9 +134,9 @@ | |||
|
|||
|
|||
class TestDTWF_recombination_no_SV_sims: | |||
def test_one_recombination_slow_sim(self): | |||
def test_one_break_slow_sim(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method test_one_break_slow_sim
lacks a docstring. Adding a brief explanation of what this test verifies would enhance code readability and maintainability.
@@ -117,14 +145,14 @@ | |||
assert ts.num_trees > 1 | |||
assert ts.at_index(0).num_edges > 0 | |||
|
|||
def test_one_recombination_slow_sim_vs_tskit(self): | |||
@pytest.mark.parametrize("seed", [123, 321]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameterized test test_one_break_slow_sim_vs_tskit
is well-implemented, but consider adding a docstring to explain the purpose of the test, the significance of the seed values, and what it aims to verify between the GIG simulator and the tskit simulator.
ts_simulator = tskit_DTWF_simulator(sequence_length=L) | ||
gig = gig_simulator.run(7, L, gens=gens, left_sort_mrcas=True, random_seed=seed) | ||
gig = gig_simulator.run(7, L, gens=gens, random_seed=seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable L
is used to set the sequence length for the tskit simulator but is hardcoded. Consider making this a parameter or defining it more clearly to improve flexibility and readability.
def run( | ||
self, num_diploids, seq_len, gens, *, left_sort_mrcas=False, random_seed=None | ||
): | ||
def run(self, num_diploids, seq_len, gens, *, random_seed=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The run
method's docstring mentions a parameter left_pos_sort_mrcas
that does not appear in the method signature. Either remove this mention from the docstring or add the parameter to the method signature if it was mistakenly omitted.
""" | ||
if left_pos_sort_mrcas is True, then before calling gig.random_matching_positions | ||
the mrcas list will be sorted by the left position of the mrca region (rather | ||
than using the ID of the MRCA node). This is useful for testing against a simpler | ||
model that picks a breakpoint from left to right along the matching regions. | ||
""" | ||
self.random = np.random.default_rng(random_seed) | ||
self.left_sort_mrcas = left_sort_mrcas | ||
self.num_tries_for_breakpoint = 20 # number of tries to find a breakpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded value 20
for num_tries_for_breakpoint
might benefit from being a parameter to the method or class to allow for more flexibility in testing and usage.
def find_comparable_points(self, gig, parent_nodes): | ||
""" | ||
Find comparable points in the parent nodes, and return the | ||
coordinates of the matching regions in the parent nodes. | ||
""" | ||
mrcas = gig.find_mrca_regions(*parent_nodes) | ||
comparable_pts = gig.random_matching_positions(mrcas, self.random) | ||
# Pick a single breakpoint: if both breaks are inverted relative to the mrca | ||
# (i.e. negative) it's OK: both have the same orientation relative to each other | ||
tries = 0 | ||
while comparable_pts[0] * comparable_pts[1] < 0: | ||
comparable_pts = self.gig.random_matching_positions(mrcas, self.random) | ||
tries += 1 | ||
if tries > self.num_tries_for_breakpoint: | ||
raise ValueError( | ||
"Could not find a pair of matching regions in the same orientation" | ||
) | ||
return comparable_pts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method find_comparable_points
lacks a detailed docstring explaining its purpose, parameters, and return value. Adding a comprehensive docstring would improve code readability and maintainability. Additionally, consider explaining the logic behind the while loop and the condition for inverting breakpoints more clearly.
# if both breaks are inverted relative to the mrca (i.e. negative) that's fine, | ||
# as they are both in the same orientation relative to each other. We hack it | ||
# here because if negative, the positions are out-by-one | ||
comparable_pts = self.find_comparable_points(self.gig, parent_nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method add_inheritance_paths
calls find_comparable_points
but does not handle the potential ValueError
that could be raised if a suitable pair of matching regions is not found. Consider adding error handling to improve robustness.
tests/gigutil.py
Outdated
# here because if negative, the positions are out-by-one | ||
comparable_pts = self.find_comparable_points(self.gig, parent_nodes) | ||
|
||
# Minor hack when both comparable_pts are negative, in which case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "Minor hack when both comparable_pts are negative" could benefit from further explanation or a more robust solution if this is a known issue that could impact the simulation's accuracy or reliability.
And allow inversions!
And allow inversions!
Summary by CodeRabbit