-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qmalcolm add event for debugging artifact writing #10936
Changes from all commits
c018e9d
35db39a
b5d6564
4cd3e52
5e9d3d6
6f64052
0223f3a
a30a3a7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
kind: Features | ||
body: Emit debug logging event whenever artifacts are written | ||
time: 2024-10-29T18:17:28.321188-05:00 | ||
custom: | ||
Author: QMalcolm | ||
Issue: N/A |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
kind: Fixes | ||
body: Handle exceptions in `get_execution_status` more broadly to better ensure `run_results.json` | ||
gets written | ||
time: 2024-10-29T18:20:34.782845-05:00 | ||
custom: | ||
Author: QMalcolm | ||
Issue: "10934" |
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -93,11 +93,16 @@ def get_execution_status(sql: str, adapter: BaseAdapter) -> Tuple[RunStatus, str | |
response, _ = adapter.execute(sql, auto_begin=False, fetch=False) | ||
status = RunStatus.Success | ||
message = response._message | ||
except (KeyboardInterrupt, SystemExit): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean if we cancel during get_execution_status no result file will be written? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. We could re-catch the |
||
raise | ||
except DbtRuntimeError as exc: | ||
status = RunStatus.Error | ||
message = exc.msg | ||
finally: | ||
return status, message | ||
except Exception as exc: | ||
status = RunStatus.Error | ||
message = str(exc) | ||
|
||
return (status, message) | ||
|
||
|
||
def track_model_run(index, num_nodes, run_model_result): | ||
|
@@ -645,7 +650,6 @@ def safe_run_hooks( | |
return RunStatus.Success | ||
|
||
status = RunStatus.Success | ||
failed = False | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for making this logic more concise. If I understand it correctly, we want the behavior to be: if a node fails, we want to skip the rest? I think the direction of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is correct, if one hook fails, the rest are expected to be skipped. I can clean up the code a bit more, I was trying to avoid too much refactoring in this PR as it's generally best practice to separate feature/bug work from refactors. I actually walked back some refactoring work I did in this regard. To the keen eyed observer, 5e9d3d6 foreshadowed such work. Work that I ultimately dropped before pushing my branch to reduce the number of changes:
|
||
num_hooks = len(ordered_hooks) | ||
|
||
for idx, hook in enumerate(ordered_hooks, 1): | ||
|
@@ -654,9 +658,8 @@ def safe_run_hooks( | |
hook_name = f"{hook.package_name}.{hook_type}.{hook.index - 1}" | ||
execution_time = 0.0 | ||
timing: List[TimingInfo] = [] | ||
failures = 1 | ||
|
||
if not failed: | ||
if status == RunStatus.Success: | ||
with collect_timing_info("compile", timing.append): | ||
sql = self.get_hook_sql( | ||
adapter, hook, hook.index, num_hooks, extra_context | ||
|
@@ -682,13 +685,11 @@ def safe_run_hooks( | |
finished_at = timing[1].completed_at or datetime.utcnow() | ||
hook.update_event_status(finished_at=finished_at.isoformat()) | ||
execution_time = (finished_at - started_at).total_seconds() | ||
failures = 0 if status == RunStatus.Success else 1 | ||
|
||
if status == RunStatus.Success: | ||
message = f"{hook_name} passed" | ||
else: | ||
message = f"{hook_name} failed, error:\n {message}" | ||
failed = True | ||
else: | ||
status = RunStatus.Skipped | ||
message = f"{hook_name} skipped" | ||
|
@@ -703,7 +704,7 @@ def safe_run_hooks( | |
message=message, | ||
adapter_response={}, | ||
execution_time=execution_time, | ||
failures=failures, | ||
failures=0 if status == RunStatus.Success else 1, | ||
node=hook, | ||
) | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "need" is a strong word. We wanted to add this event because we couldn't tell if the writing of the artifact was actually happening when debugging the issue of #10934 in cloud. Having this log could thus reduce debug time in the future if something like this happens again. Additionally, it's possible that throughout execution we end up writing an artifact multiple times, which is likely wasteful. Having a log like this would be helpful in identifying if that is happening. Additionally, the fix for the #10934 isn't guaranteed by the fix in 0223f3a. That fix is fairly narrowly scoped. It is possible if an exception gets raised elsewhere, we might run into a situation again where an artifact is not being written (thus the log is useful).
Tangentially, I just admitted the fix in 0223f3a is pretty limited. This is unfortunately necessary. I tried moving the exception handling up some levels, but quickly you run into a labyrinth of catching too much in instances where failure is expected 🫠