Skip to content

Commit

Permalink
Add overview of runtime logging to the Book of the Runtime (dotnet#98881
Browse files Browse the repository at this point in the history
)

Add draft overview of runtime logging to the Book of the Runtime
  • Loading branch information
kg authored Oct 28, 2024
1 parent 6fc6ca5 commit f0a8fa4
Showing 1 changed file with 215 additions and 0 deletions.
215 changes: 215 additions & 0 deletions docs/design/coreclr/botr/logging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
Runtime logging for developers
==============================

Date: Feb. 2024

The .NET runtime codebase is massive, spread across thousands of files in multiple languages. There are thousands of log messages spread throughout that codebase, and most of them are disabled by default. So how do you, a runtime developer beset by all manner of mysterious test failures, get those messages out of the runtime and into your console or log files? And how should you go about adding log messages appropriately to new code you're writing?

*NOTE: This document will repeatedly reference many environment variables. For a detailed list of environment variables supported by the runtime and explanations of what they do, consult [`clrconfigvalues.h`](https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/clrconfigvalues.h), [`gcconfig.h`](https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/gcconfig.h), and [`jitconfigvalues.h`](https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/jitconfigvalues.h).*

Types of logging
================

EventPipe
-----------------------
EventPipe is a cross-platform tracing system similar to ETW that is used widely in the .NET runtime.
For a detailed overview of EventPipe, see [the EventPipe documentation pages on .NET Learn](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/eventpipe). It offers excellent performance along with lots of flexibility, so it is one of the main ways to get information from an active runtime that we advertise. Most events that we support in debug builds also are exposed in release builds.

For basic scenarios, you can use the [`dotnet-counters`](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-counters) tool to monitor events and performance counters the runtime reports through EventPipe. For example, to collect the default counters from a project while running it, you could use `dotnet-counters collect -- dotnet exec myapp.dll`, where the part after `--` specifies a command to trace.

You can also use the [`dotnet-trace`](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace) tool to capture EventPipe events from the runtime on the fly. It provides command line options you can use to filter events by severity level or based on keywords. For example, to capture informational GC events from a project while running it, you could use `dotnet-trace collect --clreventlevel informational --clrevents gc -- dotnet exec myapp.dll`, where the part after `--` specifies a command to trace. The `collect` command also supports a set of convenient default profiles, accessible via the `--profile` switch, i.e. `--profile gc-collect`:
```
cpu-sampling - Useful for tracking CPU usage and general .NET runtime information. This is the default option if no profile or providers are specified.
gc-verbose - Tracks GC collections and samples object allocations.
gc-collect - Tracks GC collections only at very low overhead.
database - Captures ADO.NET and Entity Framework database commands
```

The `dotnet-trace collect` command will produce a `.nettrace` file, which you can analyze via `dotnet-trace report topN` (for CPU sampling information) or `dotnet-trace convert`. You can also open this `.nettrace` file directly in Visual Studio to inspect it, for example by double-clicking it in solution explorer or dragging it into the Visual Studio window.

If your application fails to run under `dotnet-counters` or `dotnet-trace`'s monitoring, try passing the `--show-child-io` tool argument to make the child process's output visible so you can spot any error messages. Make sure you're running it from the right working directory with any necessary environment variables.

Please see the linked documentation pages for more information on how to use each of the above tools. They have an extensive set of options and some helpful features like time limits, attaching to existing processes, and configurable output formatting.

To emit your very own EventPipe events from within the C++ part of the runtime, you can use the `FireEtwXXX` APIs, or their convenience wrappers within the `ETW::` C++ namespace. You likely will want to create new type(s) of events instead of using existing ones. The events are generated by the [`genEventing.py`](https://github.com/dotnet/runtime/blob/main/src/coreclr/scripts/genEventing.py) script based on the definitions in [`ClrEtwAll.man`](https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/ClrEtwAll.man).

To emit EventPipe events from C# you can use the [`System.Diagnostics.Tracing.EventSource`](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/eventsource-getting-started) class, by defining your own event types derived from it. See its documentation for examples and more details.

Alternative event consumers
-----------------------

For advanced scenarios, EventSource and the C++ event wrappers both support standard event tracing systems on their respective platforms (ETW on Windows, LTTNG on Linux). EventPipe itself does not integrate with these systems, but the runtime's layers on top of EventPipe will also use them if they are enabled.

On Windows you can use standard [ETW](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/event-tracing-for-windows) tools like [WPR](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-recorder) with its '.NET Activity' scenario. A convenient all-purpose tool for collecting and analyzing ETW traces on Windows is [PerfView](https://github.com/microsoft/perfview). To use PerfView you'll want to consult its [extensive documentation](http://htmlpreview.github.io/?https://github.com/Microsoft/perfview/blob/main/src/PerfView/SupportFiles/UsersGuide.htm) or go through the tutorial, but a basic starting point is to launch your application via the *Collect*->*Run* menu option, and then double-click the resulting recording in the explorer pane to the left in order to open it.

On Linux you can use [LTTNG](https://lttng.org/), and the [perfcollect script](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/trace-perfcollect-lttng) is a useful tool you can use for a more ETW-like experience.

StressLog
------------
The StressLog is a circular buffer inside the runtime process that usually does not escape, and most StressLog messages are available in retail builds, which makes it useful for troubleshooting issues with GC or other subsystems in production scenarios. To enable it, you can set the `DOTNET_StressLog` environment variable to `1`, and you can configure it using environment variables, as demonstrated below:

```bash
echo Enable StressLog
set DOTNET_StressLog=1

echo Show JIT log messages
set DOTNET_LogFacility=0x00000008

echo Only show warnings or errors
set DOTNET_LogLevel=3

echo If the buffer is filling too quickly, it can be helpful to set large buffer sizes.
echo But if you have many threads, you may want to set a low limit to avoid exhausting memory.
echo Set a per-thread size limit for StressLog of 20MB. Config settings default to hex.
set DOTNET_StressLogSize=1400000

echo Set a process-wide size limit for all StressLogs (combined) of 400MB. Config settings default to hex.
set DOTNET_TotalStressLogSize=18000000

echo Write the StressLog's to a file instead of memory for cases where you can't attach a debugger or get a dump file.
set DOTNET_StressLogFilename=mystresslog.log
```
To write your own messages to the StressLog from C++, you can use the `STRESS_LOGN(facility, level, msg, ...)` macros, i.e. `STRESS_LOG1(LF_GC, LL_ERROR, "A significant but non-fatal error occurred in the garbage collector! Here's my favorite number: %d\n", 42)` where the first argument is one or more logging facilities (you can combine them using `|` i.e. `LF_GC | LF_GCROOTS`), the second argument is a severity level, and the third argument is the log message format string. See [log facilities and levels](#log-facilities-and-levels), below, for more information.
You can't write your own messages to the StressLog from C#. If for some reason you need to while testing, perhaps you could expose it via a custom qcall.
Traditional .NET Runtime Logging
--------------------------------
"Traditional" log messages are only available in debug or checked builds of the runtime. To enable them, you can set the `DOTNET_LogEnable` environment variable to `1`, and configure them using environment variables. There are various configuration variables for traditional logging, demonstrated below:
```bash
echo Enable traditional logging
set DOTNET_LogEnable=1
echo Show JIT log messages
set DOTNET_LogFacility=0x00000008
echo Only show warnings or errors
set DOTNET_LogLevel=3
echo Log messages to the debugger
set DOTNET_LogToDebugger=0
echo Log messages to the console
set DOTNET_LogToConsole=1
echo Log messages to a specific file
set DOTNET_LogToFile=0
set DOTNET_LogFile=mylog.log
echo Append log messages to the file instead of erasing it at start
set DOTNET_LogFileAppend=1
echo Flush the log file after every write to ensure messages are not lost after a crash
set DOTNET_LogFlushFile=1
echo Attach the process ID to every log message for multi-process scenarios
set DOTNET_LogWithPid=1
```
This classical logging system is not frequently used, so individual log messages you encounter may be partially or completely nonfunctional - for example, 64-bit-only issues in outdated log statements - but the basics should always work.
To send your own traditional log messages from C++, you can use the `LOG((facility, level, msg, ...))` macro, i.e. `LOG((LF_GC, LL_INFO10000, "An insignificant thing happened in the garbage collector.\n"))`. The arguments are roughly equivalent to those described above for StressLog. See [log facilities and levels](#log-facilities-and-levels), below, for more information on facilities and levels.
Note that the `LOG` and `STRESS_LOG` macros are very similar, so for debugging purposes you can temporarily convert a `LOG((...))` statement into a `STRESS_LOG(...)` statement in order to take advantage of StressLog functionality to diagnose an issue.
If you need to send messages to the traditional log from C#, similar to StressLog you could expose it via a custom qcall temporarily.
Log Facilities and Levels
-------------------------
StressLog and traditional logging both rely on the `DOTNET_LogFacility`, `DOTNET_LogFacility2`, and `DOTNET_LogLevel` environment variables to control their level of verbosity and filter the information logged.
`DOTNET_LogLevel` allows filtering out log messages of lower importance, regardless of category. This variable is specified as an integer, not a named constant. The levels as of this writing are listed below, and come from [`log.h`](https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/log.h):
```c
LL_EVERYTHING 10
LL_INFO1000000 9 // can be expected to generate 1,000,000 logs per small but not trivial run
LL_INFO100000 8 // can be expected to generate 100,000 logs per small but not trivial run
LL_INFO10000 7 // can be expected to generate 10,000 logs per small but not trivial run
LL_INFO1000 6 // can be expected to generate 1,000 logs per small but not trivial run
LL_INFO100 5 // can be expected to generate 100 logs per small but not trivial run
LL_INFO10 4 // can be expected to generate 10 logs per small but not trivial run
LL_WARNING 3
LL_ERROR 2
LL_FATALERROR 1
LL_ALWAYS 0 // impossible to turn off (log level never negative)
```
`DOTNET_LogFacility` and `DOTNET_LogFacility2` allow filtering messages to specific categories. These variables are specified as integers, not named constants. For example, the `LF_GC` facility's value is `0x00000001` or just `1`. The list of available options for `DOTNET_LogFacility1` as of this writing are listed below, and come from [`loglf.h`](https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/loglf.h):
```c
LF_GC 0x00000001
LF_GCINFO 0x00000002
LF_STUBS 0x00000004
LF_JIT 0x00000008
LF_LOADER 0x00000010
LF_METADATA 0x00000020
LF_SYNC 0x00000040
LF_EEMEM 0x00000080
LF_GCALLOC 0x00000100
LF_CORDB 0x00000200
LF_CLASSLOADER 0x00000400
LF_CORPROF 0x00000800
LF_DIAGNOSTICS_PORT 0x00001000
LF_DBGALLOC 0x00002000
LF_EH 0x00004000
LF_ENC 0x00008000
LF_ASSERT 0x00010000
LF_VERIFIER 0x00020000
LF_THREADPOOL 0x00040000
LF_GCROOTS 0x00080000
LF_INTEROP 0x00100000
LF_MARSHALER 0x00200000
LF_TIEREDCOMPILATION 0x00400000 // This used to be IJW, but now repurposed for tiered compilation
LF_ZAP 0x00800000
LF_STARTUP 0x01000000 // Log startup and shutdown failures
LF_APPDOMAIN 0x02000000
LF_CODESHARING 0x04000000
LF_STORE 0x08000000
LF_SECURITY 0x10000000
LF_LOCKS 0x20000000
LF_BCL 0x40000000
```
`DOTNET_LogFacility2` is newer, and at present only has one facility available: `LF2_MULTICOREJIT`, with a value of `0x00000001`.
If you're trying to capture a specific message in the log and it's not showing up, make sure you've checked its log level/facility and set your environment variables accordingly!
Mono logging
------------
Builds of .NET using the Mono runtime have their own Mono-specific log configuration environment variables that control what diagnostic information is logged by the runtime.
You can use `MONO_LOG_LEVEL` to configure the overall verbosity by setting it to one of `"error"`, `"critical"`, `"warning"`, `"message"`, `"info"`, or `"debug"`. You can use `MONO_LOG_MASK` to filter the log messages down to specific categories like `gc` or `aot`. For a full list of mask options, see [`mono-logger.c`](https://github.com/dotnet/runtime/blob/main/src/mono/mono/utils/mono-logger.c)'s `mono_trace_set_mask_string` function.
When dealing with Mono's interpreter, the `MONO_VERBOSE_METHOD` environment variable turns on verbose logging for methods with a specific name. This can be very helpful if you are investigating situations where a method is being compiled or optimized incorrectly, or you're not certain which version of a method is running.
WebAssembly logging
-------------------
To configure Mono logging in WebAssembly builds of the runtime, you will need to specify your environment variable(s) in the form of a JSON blob inside of a config item inside your csproj, like so:
```xml
<ItemGroup>
<WasmExtraConfig Include="environmentVariables" Value='
{
"MONO_LOG_LEVEL": "warning",
"MONO_LOG_MASK": "all"
}' />
</ItemGroup>
```
Environment variables set externally at build time or at web browser launch time will not automatically flow through into the WASM runtime.
For WebAssembly builds of the runtime there is an additional interop layer written in TypeScript that has its own logging facilities, defined in [`logging.ts`](https://github.com/dotnet/runtime/blob/main/src/mono/browser/runtime/logging.ts).
For debug-severity log messages from the TypeScript layer, they will be suppressed by default unless the `diagnosticTracing` flag is set. To set it, call `.withDiagnosticTracing(true)` on the `dotnet` object during startup, or add an extra config item to your csproj like so:
```xml
<ItemGroup>
<WasmExtraConfig Include="diagnosticTracing" Value="true" />
</ItemGroup>
```
All other log severities are enabled by default, and they will be written to the developer tools console when running in a web browser. When running automated tests from the command line or running an application in an environment like node.js or the v8 shell, they will be written to standard output and/or standard error.
To send messages from within TypeScript, use the appropriate API - `mono_log_error` for severe errors, `mono_log_info` for important information, and `mono_log_debug` for things that ordinary users will never need to see.
If for some reason you need to access this logging facility directly from C/C++, you can use the `mono_wasm_trace_logger` function. Note that fatal errors sent through this function will trigger an immediate process exit after they are written.
By default the jiterpreter will only log error messages to the console, but if `MONO_VERBOSE_METHOD` is used, it will also log verbose information on traces within verbose methods. More advanced logging in the jiterpreter requires editing the configuration variables in [`jiterpreter.ts`](https://github.com/dotnet/runtime/blob/main/src/mono/browser/runtime/jiterpreter.ts) and compiling the runtime from source.

0 comments on commit f0a8fa4

Please sign in to comment.