-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-43484: Fix BPS auto-retries and add support for testing that in ci_middleware #286
Conversation
f2a0a49
to
ed8df98
Compare
This should get clustered retries in BPS working again after they were broken by DM-43060, and I think make them more efficient than they were before the breakage because quanta that succeeded will be skipped rather than rerun.
ed8df98
to
48081ef
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #286 +/- ##
==========================================
- Coverage 87.35% 87.29% -0.06%
==========================================
Files 49 49
Lines 4467 4472 +5
Branches 770 771 +1
==========================================
+ Hits 3902 3904 +2
- Misses 412 415 +3
Partials 153 153 ☔ View full report in Codecov by Sentry. |
48081ef
to
08a2aa3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple questions (probably just to help me understand). Merge approved.
@@ -478,19 +485,25 @@ def parse_mock_failure( | |||
value : `~collections.abc.Iterable` [`str`] or `None` | |||
Value from option. | |||
""" | |||
result: dict[str, tuple[str, type[Exception] | None]] = {} | |||
from lsst.pipe.base.tests.mocks import ForcedFailure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this import go here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to avoid importing test-only code (especially this test-only code, which mucks the the storage class singleton) in production code paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment to that effect?
task_label, error_type_name, where = entry.split(":", 2) | ||
task_label, error_type_name, where, *rest = entry.split(":") | ||
if rest: | ||
(memory_required,) = rest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this line fail if rest is more than 1 item (future-proofing code)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; the preexisting except
block below should catch that and raise a user-appropriate message in that case. I'm only using *rest
as a way to make the final entry optional, not to support more than one value there.
fail. The final optional term is a memory threshold (with units | ||
recognized by astropy), which will cause the error to only occur if | ||
the available memory (according to ExecutionResources.max_mem) is less | ||
than this value.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to reread the final term description multiple times to understand why "less than" was correct. I suspect that's because I'm not as familiar with ExecutionResources. Am I right in the ExecutionResources is what is "available" to use and the memory threshold (final term) is what the mocked execution is to fake using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in other packages refer to the "memory threshold" as "requested memory". Doing the same here might help some.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I'll see if I can reword.
52acf02
to
8230571
Compare
Checklist
doc/changes