Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CT-1685: Restore certain aspects of legacy logging behavior important… #6443

Merged
merged 2 commits into from
Dec 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changes/unreleased/Under the Hood-20221213-214106.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
kind: Under the Hood
body: Restore important legacy logging behaviors, following refactor which removed
them
time: 2022-12-13T21:41:06.815133-05:00
custom:
Author: peterallenwebb
Issue: "6437"
32 changes: 29 additions & 3 deletions core/dbt/events/eventmgr.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,33 @@ class LineFormat(Enum):
}


# We should consider fixing the problem, but log_level() can return a string for
# DynamicLevel events, even thought it is supposed to return an EventLevel. This
# function gets a string for the level, no matter what.
def _get_level_str(e: BaseEvent) -> str:
return e.log_level().value if isinstance(e.log_level(), EventLevel) else str(e.log_level())


# We need this function for now because the numeric log severity levels in
# Python do not match those for logbook, so we have to explicitly call the
# correct function by name.
def send_to_logger(l, level: str, log_line: str):
if level == "test":
l.debug(log_line)
elif level == "debug":
l.debug(log_line)
elif level == "info":
l.info(log_line)
elif level == "warn":
l.warning(log_line)
elif level == "error":
l.error(log_line)
else:
raise AssertionError(
f"While attempting to log {log_line}, encountered the unhandled level: {level}"
)


@dataclass
class LoggerConfig:
name: str
Expand Down Expand Up @@ -93,7 +120,7 @@ def write_line(self, e: BaseEvent):
line = self.create_line(e)
python_level = _log_level_map[e.log_level()]
if self._python_logger is not None:
self._python_logger.log(python_level, line)
send_to_logger(self._python_logger, _get_level_str(e), line)
elif self._stream is not None and _log_level_map[self.level] <= python_level:
self._stream.write(line + "\n")

Expand Down Expand Up @@ -128,8 +155,7 @@ def create_debug_line(self, e: BaseEvent) -> str:
log_line = f"\n\n{separator} {datetime.utcnow()} | {self.event_manager.invocation_id} {separator}\n"
ts: str = datetime.utcnow().strftime("%H:%M:%S.%f")
scrubbed_msg: str = self.scrubber(e.message()) # type: ignore
# log_level() for DynamicLevel events returns str instead of EventLevel
level = e.log_level().value if isinstance(e.log_level(), EventLevel) else e.log_level()
level = _get_level_str(e)
log_line += (
f"{self._get_color_tag()}{ts} [{level:<5}]{self._get_thread_name()} {scrubbed_msg}"
)
Expand Down
14 changes: 10 additions & 4 deletions core/dbt/events/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,6 @@
LOG_VERSION = 3
metadata_vars: Optional[Dict[str, str]] = None

# The default event manager will not log anything, but some tests run code that
# generates events, without configuring the event manager.
EVENT_MANAGER: EventManager = EventManager()


def setup_event_logger(log_path: str, level_override: Optional[EventLevel] = None):
cleanup_event_logger()
Expand Down Expand Up @@ -114,6 +110,16 @@ def cleanup_event_logger():
EVENT_MANAGER.callbacks.clear()


# The default event manager will not log anything, but some tests run code that
# generates events, without configuring the event manager, so we create an empty
# manager here until there is a better testing strategy in place.
EVENT_MANAGER: EventManager = EventManager()
Comment on lines +113 to +116
Copy link
Contributor

@jtcohen6 jtcohen6 Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed a functional regression related to the fact that EVENT_MANAGER does not have any logger configured, prior to setup_event_logger being called.

If dbt runs into configuration errors during initialization, they are not surfaced to the user.

See #6434

Copy link
Contributor Author

@peterallenwebb peterallenwebb Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I was worried about that. I'll address that today with a separate PR. There is a strange issue where file logging is not set up until further into the initialization process. Do you think it would be acceptable to set up only stdout logging until the log file path can be determined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that work in dbt-rpc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO only stdout logging is acceptable for now. My sense is, we really only expect these logs to fire if something goes wrong during initialization. It's okay to have some hardcoded, generic, unconfigurable behavior that just ensures we're showing the error to the user.

We'll need to figure out the story for dbt-server, since I think the initial plan is to tail the logs/dbt.log file. If we think the longer-term answer is registering a custom callback, that might solve our problem.

As far as dbt-rpc, all we need is ongoing compatibility with the legacy logger for v1.4. The way I understand the code in this PR, if the ENABLE_LEGACY_LOGGER env var is enabled, we'll immediately flip on the legacy logger, without waiting for the call to setup_event_logger.

Copy link
Contributor Author

@peterallenwebb peterallenwebb Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 Your understanding is correct w.r.t. dbt-rpc

From speaking to @racheldaniel, I have a fairly good idea how dbt-server is using logging now. They do their own configuration of the EVENT_MANAGER before running tasks, so I feel confident that we have a good story for them.

I'm going to prepare a PR to address #6434, now. I believe that will restore all of the important logging behaviors we had in 1.3, and I have to apologize for not considering how important it was to get something set up before the call to setup_event_logger() is made.

With all that said, there are some interesting things to note about the file logging behavior of the legacy logger. I found earlier this week that it starts recording messages after it is set up, which is immediately dbt starts. It then writes the recorded messages to the log file (if any) when the correct file path has been determined, and stops recording. My hope is that in 1.5 we can find a way to start all logging, both console and file, immediately when dbt starts. Does that seem feasible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to apologize for not considering how important it was to get something set up before the call to setup_event_logger() is made.

I'm glad we caught it early! I think the failure here was lack of sufficient inline code comment around the event system initialization, to scare you away from what was a perfectly sensible refactor. "We need to configure these defaults here, even though it looks like we shouldn't, because some log events might arrive before we have the real inputs for proper configuration — namely, if we encounter an early exception."

I found earlier this week that it starts recording messages after it is set up, which is immediately dbt starts. It then writes the recorded messages to the log file (if any) when the correct file path has been determined, and stops recording.

IIRC, it does this by storing logs in a "buffer," and then "replaying" the buffer once the proper log path has been determined. A little too magical for my taste buds, but this might be an acceptable pattern to emulate.

dbt-core/core/dbt/logger.py

Lines 417 to 421 in 5e4e917

def _replay_buffered(self):
assert self._msg_buffer is not None, "_msg_buffer should never be None in _replay_buffered"
for record in self._msg_buffer:
super().emit(record)
self._msg_buffer = None

My hope is that in 1.5 we can find a way to start all logging, both console and file, immediately when dbt starts. Does that seem feasible?

Maybe! We could roll our own similar version of a stateful "buffer"? Or split log-path out of dbt_project.yml, so that we can know where we're logging without having to parse any config/files? (Feels related: #6207)

FWIW in the happy path, when config initialization succeeds (99% of the time), I don't think we're firing any log events before setup_event_logger is called.


# Since dbt-rpc does not do its own log setup, we set up logbook if legacy
# logging is enabled.
if flags.ENABLE_LEGACY_LOGGER:
EVENT_MANAGER.add_logger(_get_logbook_log_config(None))

# This global, and the following two functions for capturing stdout logs are
# an unpleasant hack we intend to remove as part of API-ification. The GitHub
# issue #6350 was opened for that work.
Expand Down