Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43484: Fix BPS auto-retries and add support for testing that in ci_middleware #286

Merged
merged 2 commits into from
Mar 28, 2024

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Mar 27, 2024

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

This should get clustered retries in BPS working again after they were
broken by DM-43060, and I think make them more efficient than they
were before the breakage because quanta that succeeded will be skipped
rather than rerun.
Copy link

codecov bot commented Mar 27, 2024

Codecov Report

Attention: Patch coverage is 44.44444% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 87.29%. Comparing base (d2bfc69) to head (8230571).

Files Patch % Lines
python/lsst/ctrl/mpexec/cli/opt/options.py 37.50% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #286      +/-   ##
==========================================
- Coverage   87.35%   87.29%   -0.06%     
==========================================
  Files          49       49              
  Lines        4467     4472       +5     
  Branches      770      771       +1     
==========================================
+ Hits         3902     3904       +2     
- Misses        412      415       +3     
  Partials      153      153              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@MichelleGower MichelleGower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions (probably just to help me understand). Merge approved.

@@ -478,19 +485,25 @@ def parse_mock_failure(
value : `~collections.abc.Iterable` [`str`] or `None`
Value from option.
"""
result: dict[str, tuple[str, type[Exception] | None]] = {}
from lsst.pipe.base.tests.mocks import ForcedFailure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this import go here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to avoid importing test-only code (especially this test-only code, which mucks the the storage class singleton) in production code paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to that effect?

task_label, error_type_name, where = entry.split(":", 2)
task_label, error_type_name, where, *rest = entry.split(":")
if rest:
(memory_required,) = rest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this line fail if rest is more than 1 item (future-proofing code)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; the preexisting except block below should catch that and raise a user-appropriate message in that case. I'm only using *rest as a way to make the final entry optional, not to support more than one value there.

fail. The final optional term is a memory threshold (with units
recognized by astropy), which will cause the error to only occur if
the available memory (according to ExecutionResources.max_mem) is less
than this value."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to reread the final term description multiple times to understand why "less than" was correct. I suspect that's because I'm not as familiar with ExecutionResources. Am I right in the ExecutionResources is what is "available" to use and the memory threshold (final term) is what the mocked execution is to fake using?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in other packages refer to the "memory threshold" as "requested memory". Doing the same here might help some.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I'll see if I can reword.

@TallJimbo TallJimbo force-pushed the tickets/DM-43484 branch 2 times, most recently from 52acf02 to 8230571 Compare March 28, 2024 01:09
@TallJimbo TallJimbo merged commit 716b37d into main Mar 28, 2024
13 of 15 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-43484 branch March 28, 2024 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants