Add overview of runtime logging to the Book of the Runtime (dotnet#98881

) Add draft overview of runtime logging to the Book of the Runtime
BruceForstall · Oct 28, 2024 · f0a8fa4 · f0a8fa4
1 parent 6fc6ca5
commit f0a8fa4
Showing 1 changed file with 215 additions and 0 deletions.
diff --git a/docs/design/coreclr/botr/logging.md b/docs/design/coreclr/botr/logging.md
@@ -0,0 +1,215 @@
+Runtime logging for developers
+==============================
+
+Date: Feb. 2024
+
+The .NET runtime codebase is massive, spread across thousands of files in multiple languages. There are thousands of log messages spread throughout that codebase, and most of them are disabled by default. So how do you, a runtime developer beset by all manner of mysterious test failures, get those messages out of the runtime and into your console or log files? And how should you go about adding log messages appropriately to new code you're writing?
+
+*NOTE: This document will repeatedly reference many environment variables. For a detailed list of environment variables supported by the runtime and explanations of what they do, consult [`clrconfigvalues.h`](https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/clrconfigvalues.h), [`gcconfig.h`](https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/gcconfig.h), and [`jitconfigvalues.h`](https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/jitconfigvalues.h).*
+
+Types of logging
+================
+
+EventPipe
+-----------------------
+EventPipe is a cross-platform tracing system similar to ETW that is used widely in the .NET runtime.
+For a detailed overview of EventPipe, see [the EventPipe documentation pages on .NET Learn](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/eventpipe). It offers excellent performance along with lots of flexibility, so it is one of the main ways to get information from an active runtime that we advertise. Most events that we support in debug builds also are exposed in release builds.
+
+For basic scenarios, you can use the [`dotnet-counters`](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-counters) tool to monitor events and performance counters the runtime reports through EventPipe. For example, to collect the default counters from a project while running it, you could use `dotnet-counters collect -- dotnet exec myapp.dll`, where the part after `--` specifies a command to trace.
+
+You can also use the [`dotnet-trace`](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace) tool to capture EventPipe events from the runtime on the fly. It provides command line options you can use to filter events by severity level or based on keywords. For example, to capture informational GC events from a project while running it, you could use `dotnet-trace collect --clreventlevel informational --clrevents gc -- dotnet exec myapp.dll`, where the part after `--` specifies a command to trace. The `collect` command also supports a set of convenient default profiles, accessible via the `--profile` switch, i.e. `--profile gc-collect`:
+```
+cpu-sampling     - Useful for tracking CPU usage and general .NET runtime information. This is the default option if no profile or providers are specified.
+gc-verbose       - Tracks GC collections and samples object allocations.
+gc-collect       - Tracks GC collections only at very low overhead.
+database         - Captures ADO.NET and Entity Framework database commands
+```
+
+The `dotnet-trace collect` command will produce a `.nettrace` file, which you can analyze via `dotnet-trace report topN` (for CPU sampling information) or `dotnet-trace convert`. You can also open this `.nettrace` file directly in Visual Studio to inspect it, for example by double-clicking it in solution explorer or dragging it into the Visual Studio window.
+
+If your application fails to run under `dotnet-counters` or `dotnet-trace`'s monitoring, try passing the `--show-child-io` tool argument to make the child process's output visible so you can spot any error messages. Make sure you're running it from the right working directory with any necessary environment variables.
+
+Please see the linked documentation pages for more information on how to use each of the above tools. They have an extensive set of options and some helpful features like time limits, attaching to existing processes, and configurable output formatting.
+
+To emit your very own EventPipe events from within the C++ part of the runtime, you can use the `FireEtwXXX` APIs, or their convenience wrappers within the `ETW::` C++ namespace. You likely will want to create new type(s) of events instead of using existing ones. The events are generated by the [`genEventing.py`](https://github.com/dotnet/runtime/blob/main/src/coreclr/scripts/genEventing.py) script based on the definitions in [`ClrEtwAll.man`](https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/ClrEtwAll.man).
+
+To emit EventPipe events from C# you can use the [`System.Diagnostics.Tracing.EventSource`](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/eventsource-getting-started) class, by defining your own event types derived from it. See its documentation for examples and more details.
+
+Alternative event consumers
+-----------------------
+
+For advanced scenarios, EventSource and the C++ event wrappers both support standard event tracing systems on their respective platforms (ETW on Windows, LTTNG on Linux). EventPipe itself does not integrate with these systems, but the runtime's layers on top of EventPipe will also use them if they are enabled.
+
+On Windows you can use standard [ETW](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/event-tracing-for-windows) tools like [WPR](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-recorder) with its '.NET Activity' scenario. A convenient all-purpose tool for collecting and analyzing ETW traces on Windows is [PerfView](https://github.com/microsoft/perfview). To use PerfView you'll want to consult its [extensive documentation](http://htmlpreview.github.io/?https://github.com/Microsoft/perfview/blob/main/src/PerfView/SupportFiles/UsersGuide.htm) or go through the tutorial, but a basic starting point is to launch your application via the *Collect*->*Run* menu option, and then double-click the resulting recording in the explorer pane to the left in order to open it.
+
+On Linux you can use [LTTNG](https://lttng.org/), and the [perfcollect script](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/trace-perfcollect-lttng) is a useful tool you can use for a more ETW-like experience.
+
+StressLog
+------------
+The StressLog is a circular buffer inside the runtime process that usually does not escape, and most StressLog messages are available in retail builds, which makes it useful for troubleshooting issues with GC or other subsystems in production scenarios. To enable it, you can set the `DOTNET_StressLog` environment variable to `1`, and you can configure it using environment variables, as demonstrated below:
+
+```bash
+echo Enable StressLog
+set DOTNET_StressLog=1
+
+echo Show JIT log messages
+set DOTNET_LogFacility=0x00000008
+
+echo Only show warnings or errors
+set DOTNET_LogLevel=3
+
+echo If the buffer is filling too quickly, it can be helpful to set large buffer sizes.
+echo But if you have many threads, you may want to set a low limit to avoid exhausting memory.
+echo Set a per-thread size limit for StressLog of 20MB. Config settings default to hex.
+set DOTNET_StressLogSize=1400000
+
+echo Set a process-wide size limit for all StressLogs (combined) of 400MB. Config settings default to hex.
+set DOTNET_TotalStressLogSize=18000000
+
+echo Write the StressLog's to a file instead of memory for cases where you can't attach a debugger or get a dump file.
+set DOTNET_StressLogFilename=mystresslog.log
+```
+
+To write your own messages to the StressLog from C++, you can use the `STRESS_LOGN(facility, level, msg, ...)` macros, i.e. `STRESS_LOG1(LF_GC, LL_ERROR, "A significant but non-fatal error occurred in the garbage collector! Here's my favorite number: %d\n", 42)` where the first argument is one or more logging facilities (you can combine them using `|` i.e. `LF_GC | LF_GCROOTS`), the second argument is a severity level, and the third argument is the log message format string. See [log facilities and levels](#log-facilities-and-levels), below, for more information.
+
+You can't write your own messages to the StressLog from C#. If for some reason you need to while testing, perhaps you could expose it via a custom qcall.
+
+Traditional .NET Runtime Logging
+--------------------------------
+"Traditional" log messages are only available in debug or checked builds of the runtime. To enable them, you can set the `DOTNET_LogEnable` environment variable to `1`, and configure them using environment variables. There are various configuration variables for traditional logging, demonstrated below:
+
+```bash
+echo Enable traditional logging
+set DOTNET_LogEnable=1
+
+echo Show JIT log messages
+set DOTNET_LogFacility=0x00000008
+
+echo Only show warnings or errors
+set DOTNET_LogLevel=3
+
+echo Log messages to the debugger
+set DOTNET_LogToDebugger=0
+
+echo Log messages to the console
+set DOTNET_LogToConsole=1
+
+echo Log messages to a specific file
+set DOTNET_LogToFile=0
+set DOTNET_LogFile=mylog.log
+
+echo Append log messages to the file instead of erasing it at start
+set DOTNET_LogFileAppend=1
+
+echo Flush the log file after every write to ensure messages are not lost after a crash
+set DOTNET_LogFlushFile=1
+
+echo Attach the process ID to every log message for multi-process scenarios
+set DOTNET_LogWithPid=1
+```
+
+This classical logging system is not frequently used, so individual log messages you encounter may be partially or completely nonfunctional - for example, 64-bit-only issues in outdated log statements - but the basics should always work.
+
+To send your own traditional log messages from C++, you can use the `LOG((facility, level, msg, ...))` macro, i.e. `LOG((LF_GC, LL_INFO10000, "An insignificant thing happened in the garbage collector.\n"))`. The arguments are roughly equivalent to those described above for StressLog. See [log facilities and levels](#log-facilities-and-levels), below, for more information on facilities and levels.
+
+Note that the `LOG` and `STRESS_LOG` macros are very similar, so for debugging purposes you can temporarily convert a `LOG((...))` statement into a `STRESS_LOG(...)` statement in order to take advantage of StressLog functionality to diagnose an issue.
+
+If you need to send messages to the traditional log from C#, similar to StressLog you could expose it via a custom qcall temporarily.
+
+Log Facilities and Levels
+-------------------------
+StressLog and traditional logging both rely on the `DOTNET_LogFacility`, `DOTNET_LogFacility2`, and `DOTNET_LogLevel` environment variables to control their level of verbosity and filter the information logged.
+
+`DOTNET_LogLevel` allows filtering out log messages of lower importance, regardless of category. This variable is specified as an integer, not a named constant. The levels as of this writing are listed below, and come from [`log.h`](https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/log.h):
+
+```c
+LL_EVERYTHING  10
+LL_INFO1000000  9 // can be expected to generate 1,000,000 logs per small but not trivial run
+LL_INFO100000   8 // can be expected to generate 100,000 logs per small but not trivial run
+LL_INFO10000    7 // can be expected to generate 10,000 logs per small but not trivial run
+LL_INFO1000     6 // can be expected to generate 1,000 logs per small but not trivial run
+LL_INFO100      5 // can be expected to generate 100 logs per small but not trivial run
+LL_INFO10       4 // can be expected to generate 10 logs per small but not trivial run
+LL_WARNING      3
+LL_ERROR        2
+LL_FATALERROR   1
+LL_ALWAYS       0 // impossible to turn off (log level never negative)
+```
+
+`DOTNET_LogFacility` and `DOTNET_LogFacility2` allow filtering messages to specific categories. These variables are specified as integers, not named constants. For example, the `LF_GC` facility's value is `0x00000001` or just `1`. The list of available options for `DOTNET_LogFacility1` as of this writing are listed below, and come from [`loglf.h`](https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/loglf.h):
+```c
+LF_GC                0x00000001
+LF_GCINFO            0x00000002
+LF_STUBS             0x00000004
+LF_JIT               0x00000008
+LF_LOADER            0x00000010
+LF_METADATA          0x00000020
+LF_SYNC              0x00000040
+LF_EEMEM             0x00000080
+LF_GCALLOC           0x00000100
+LF_CORDB             0x00000200
+LF_CLASSLOADER       0x00000400
+LF_CORPROF           0x00000800
+LF_DIAGNOSTICS_PORT  0x00001000
+LF_DBGALLOC          0x00002000
+LF_EH                0x00004000
+LF_ENC               0x00008000
+LF_ASSERT            0x00010000
+LF_VERIFIER          0x00020000
+LF_THREADPOOL        0x00040000
+LF_GCROOTS           0x00080000
+LF_INTEROP           0x00100000
+LF_MARSHALER         0x00200000
+LF_TIEREDCOMPILATION 0x00400000 // This used to be IJW, but now repurposed for tiered compilation
+LF_ZAP               0x00800000
+LF_STARTUP           0x01000000 // Log startup and shutdown failures
+LF_APPDOMAIN         0x02000000
+LF_CODESHARING       0x04000000
+LF_STORE             0x08000000
+LF_SECURITY          0x10000000
+LF_LOCKS             0x20000000
+LF_BCL               0x40000000
+```
+
+`DOTNET_LogFacility2` is newer, and at present only has one facility available: `LF2_MULTICOREJIT`, with a value of `0x00000001`.
+
+If you're trying to capture a specific message in the log and it's not showing up, make sure you've checked its log level/facility and set your environment variables accordingly!
+
+Mono logging
+------------
+Builds of .NET using the Mono runtime have their own Mono-specific log configuration environment variables that control what diagnostic information is logged by the runtime.
+
+You can use `MONO_LOG_LEVEL` to configure the overall verbosity by setting it to one of `"error"`, `"critical"`, `"warning"`, `"message"`, `"info"`, or `"debug"`. You can use `MONO_LOG_MASK` to filter the log messages down to specific categories like `gc` or `aot`. For a full list of mask options, see [`mono-logger.c`](https://github.com/dotnet/runtime/blob/main/src/mono/mono/utils/mono-logger.c)'s `mono_trace_set_mask_string` function.
+
+When dealing with Mono's interpreter, the `MONO_VERBOSE_METHOD` environment variable turns on verbose logging for methods with a specific name. This can be very helpful if you are investigating situations where a method is being compiled or optimized incorrectly, or you're not certain which version of a method is running.
+
+WebAssembly logging
+-------------------
+To configure Mono logging in WebAssembly builds of the runtime, you will need to specify your environment variable(s) in the form of a JSON blob inside of a config item inside your csproj, like so:
+```xml
+<ItemGroup>
+  <WasmExtraConfig Include="environmentVariables" Value='
+{
+  "MONO_LOG_LEVEL": "warning",
+  "MONO_LOG_MASK": "all"
+}' />
+</ItemGroup>
+```
+Environment variables set externally at build time or at web browser launch time will not automatically flow through into the WASM runtime.
+
+For WebAssembly builds of the runtime there is an additional interop layer written in TypeScript that has its own logging facilities, defined in [`logging.ts`](https://github.com/dotnet/runtime/blob/main/src/mono/browser/runtime/logging.ts).
+
+For debug-severity log messages from the TypeScript layer, they will be suppressed by default unless the `diagnosticTracing` flag is set. To set it, call `.withDiagnosticTracing(true)` on the `dotnet` object during startup, or add an extra config item to your csproj like so:
+```xml
+<ItemGroup>
+  <WasmExtraConfig Include="diagnosticTracing" Value="true" />
+</ItemGroup>
+```
+
+All other log severities are enabled by default, and they will be written to the developer tools console when running in a web browser. When running automated tests from the command line or running an application in an environment like node.js or the v8 shell, they will be written to standard output and/or standard error.
+
+To send messages from within TypeScript, use the appropriate API - `mono_log_error` for severe errors, `mono_log_info` for important information, and `mono_log_debug` for things that ordinary users will never need to see.
+
+If for some reason you need to access this logging facility directly from C/C++, you can use the `mono_wasm_trace_logger` function. Note that fatal errors sent through this function will trigger an immediate process exit after they are written.
+
+By default the jiterpreter will only log error messages to the console, but if `MONO_VERBOSE_METHOD` is used, it will also log verbose information on traces within verbose methods. More advanced logging in the jiterpreter requires editing the configuration variables in [`jiterpreter.ts`](https://github.com/dotnet/runtime/blob/main/src/mono/browser/runtime/jiterpreter.ts) and compiling the runtime from source.