About this patch release
2024-12-10 lttng-tools 2.12.17 (National Lager Day)
Full changelog: v2.12.16...v2.12.17
About LTTng-tools 2.12
This release is named after Ta Meilleure, a Northeast IPA beer brewed by Lagabière. Translating to "Your best one", this beer gives out strong aromas of passion fruit, lemon, and peaches. Tastewise, expect a lot of fruit, a creamy texture, and a smooth lingering hop bitterness.
The most notable features of this new release are:
- session clearing,
- uid and gid tracking,
- file descriptor pooling (relay daemon),
- per-session grouping (relay daemon),
- working directory override (relay daemon),
- new network reception entry/exit tracepoints (LTTng-modules),
- statedump of interrupt threads (LTTng-modules),
- statedump of x86 CPU topology (LTTng-modules),
- new product UUID environment field (LTTng-modules).
Read on for a short description of each of these features and the links to this release!
Session clearing
You can use the new lttng-clear
command to clear the contents of one or more tracing sessions.
In essence, this new feature allows you to prune the content of long-running sessions without destroying and reconfiguring them. This is especially useful to clear a session's tracing data between attempts to reproduce a problem.
Clearing a tracing session deletes the contents of the tracing buffers and all local or streamed trace data on a remote peer. Note that an lttng-relayd daemon can be configured to disallow clear operations using the LTTNG_RELAYD_DISALLOW_CLEAR
environment variable.
If a session is configured in snapshot mode, only the tracing buffers are cleared.
If a session is configured in live mode, any attached client that is lagging behind will finish the consumption of its current trace data packets and jump forward in time to events generated after the beginning of the clear command.
uid and gid tracking
The existing lttng-track
command has been expanded to support uid and gid tracking.
By default, a tracing session tracks all applications and users, following LTTng's permission model.
However, this new options allows you to restrict which users and groups are tracked by both the user space and kernel tracers.
In previous versions of LTTng, it was effectively possible to filter on the basis of uids and gids using the --filter
mechanism. However, this dedicated filtering mechanism is both more efficient in terms of tracing overhead, but also prevents the creation of tracing buffers for users and groups which are not tracked.
Overall, this results in far less memory consumption by the user space tracer on systems which have multiple active users.
File descriptor pooling (relay daemon)
A number of users have reported having encountered file descriptor exhaustion issues when using the relay daemon to serve a large number of consumers or live clients.
The current on-disk CTF representation used by LTTng (and expected by a number of viewers) uses one file per CPU, per channel, to organize traces. This causes the default RLIMIT_NOFILE
value (1024 on many systems) to be reached easily, especially when tracing systems with a large number of cores.
In order to alleviate this problem, the new --fd-pool-size
option allows you to specify a maximal number of simultaneously opened file descriptors (using the soft RLIMIT_NOFILE
resource limit of the process by default). This is meant as a work-around for users who can't bump the system-limit because of permission restrictions.
As its name indicates, this option causes the relay daemon to maintain a pool (or cache) of open file descriptors which are re-purposed as needed. The most recently used files' file descriptors are kept open and only closed as the --fd-pool-size
limit is reached, keeping the number of simultaneously opened file descriptors under the user-specified limit.
Note that setting this value too low can degrade the performance of the relay daemon.
Per-session grouping (relay daemon)
By default, the relay daemon writes the traces under a predefined directory hierarchy:
$LTTNG_HOME/lttng-traces/HOSTNAME/SESSION/DOMAIN
where
HOSTNAME
is the remote hostname,SESSION
is the full session name,DOMAIN
is the tracing domain (ust
orkernel
),
Using the new relay daemon --group-output-by-session
option, you can now change this hierarchy to group traces by sessions, rather than by hostname:
$LTTNG_HOME/lttng-traces/SESSION/HOST/DOMAIN
.
This proves especially useful if you are tracing a number of hosts (with different hostnames) which share the same session name as part of their configuration. Hence, a descriptive session name (e.g. connection-hang
) can be used across a fleet of machines streaming to a given relay daemon.
Note that the default behaviour can be explicitly specified using the --group-output-by-host
option.
Working directory override (relay daemon)
This small quality of life feature allows you to override the working directory of the relay daemon using the daemon's launch options (-w PATH
/--working-directory=PATH
).
New network reception entry/exit tracepoints (LTTng-modules)
New instrumentation hooks were added to the kernel tracer in order to trace the entry
and exit
tracepoints of the network reception code paths of the Linux kernel.
You can use those tracepoints to identify the bounds of a network reception and link the events that happen in the interim (e.g. wakeup
s) to a specific network reception instance. Those tracepoints can also be used to analyse the network stack's latency.
Statedump of interrupt threads (LTTng-modules)
Threaded IRQs have an associated thread
field in the irqaction
structure which specifies the process to wake up when the IRQ happens. This field is now extracted as part of the lttng_statedump_interrupt
statedump tracepoint.
You can use this information to know which processes handle the various IRQs. It is also possible to associate the events occurring in the context of those processes to their respective IRQ.
Statedump of x86 CPU topology (LTTng-modules)
A new lttng_statedump_cpu_topology
tracepoint has been added to extract the active CPU/NUMA topology. You can use this information to know which CPUs are SMT siblings or part of the same socket. For the time being, only x86 is supported since all architectures describe their topologies differently.
The architecture
field is statically defined and should be present for all architecture implementations. Hence, it is possible for analysis tools to anticipate the event's layout.
Example output:
lttng_statedump_cpu_topology: { cpu_id = 3 }, { architecture = "x86", cpu_id = 0, vendor = "GenuineIntel", family = 6, model = 142, model_name = "Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz", physical_id = 0, core_id = 0, cores = 2 }
New product UUID environment field (LTTng-modules)
The product UUID, taken from the DMI system information, is now saved as part of the kernel traces' environment fields as the product_uuid
. You can use this field to uniquely identify a machine (virtual or physical) in order to correlate traces gathered on multiple virtual machines.