The PSMQTT architecture can be described as:
flowchart TD
%% Nodes
OS([Linux/Windows/Mac OS HW interfaces])
SMART([Hard drive SMART data])
CLK((Clock))
MQTT([MQTT Broker])
psmqttTASK(PSMQTT task handler)
psmqttSCHED(PSMQTT scheduler)
psmqttFMT(PSMQTT formatter)
%% Edge connections between nodes
OS -->|psutil| psmqttTASK
SMART -->|pySMART| psmqttTASK
CLK --> psmqttSCHED
psmqttSCHED-->psmqttTASK
psmqttTASK-->psmqttFMT
psmqttFMT-->MQTT
%% Individual node styling.
style OS color:#FFFFFF, fill:#AA00FF, stroke:#AA00FF
style SMART color:#FFFFFF, stroke:#00C853, fill:#00C853
style CLK color:#FFFFFF, stroke:#2962FF, fill:#2962FF
style MQTT color:#FFFFFF, stroke:#2962FF, fill:#2962FF
The PSMQTT configuration file defines:
- periodicity of each PSMQTT action;
- which "sensor" has to be queried; PSMQTT uses psutil and pySMART libraries to sense data from the HW of the device where PSMQTT runs (CPU, memory, temperature and fan sensors, SMART harddrive data, proces information, etc);
- how each sensor data is formatted into text;
- to which MQTT broker all the outputs will be published.
The following section provides more details about the config file syntax.
The PSMQTT configuration file is a YAML file.
The PSMQTT configuration file should be located in the same
directory containing psmqtt.py
; alternatevely you can specify the location of the
config file using the PSMQTTCONFIG environment variable
(e.g. setting PSMQTTCONFIG=~/my-config-psmqtt.yaml).
Please check the comments in the default psmqtt.yaml as documentation for most of the entries. Typically you will need to edit are those associated with the MQTT broker:
mqtt:
broker:
host: <put here the IP address of your MQTT broker>
port: <port where your MQTT broker listens, typically 1883>
The rest of this document will focus on the format of each "scheduling expression", whose general format is:
schedule:
- cron: <human-friendly CRON expression>
tasks:
- task: <task name>
params: [ <param1>, <param2>, <param3>, ... ]
formatter: <formatting rule>
topic: <MQTT topic>
Each of the following section describes in details the parameters:
<human-friendly CRON expression>
: CRON expression<task name>
and<param1>
,<param2>
,<param3>
, ...: Tasks<formatting rule>
: Formatting<MQTT topic>
: MQTT Topic
The <human-friendly CRON expression>
that appears in the scheduling expression
is a string encoding a recurrent rule,
like e.g. "every 5 minutes" or "every monday" or "every hour except 9pm, 10pm and 11pm".
You can check examples of recurring period definitions here.
Note that cron expressions should be unique; if there are several schedules with the same period only last one will be used.
PSMQTT supports a large number of "tasks". A "task" is the combination of
<task-name>
: the specification of which sensor should be read; this is just a string;- parameter list
<param1>
,<param2>
, ...,<paramN>
: these are either strings or integers represented as a YAML list (the preferred syntax is to use a comma-separated list enclosed by square brackets); such parameters act as additional selectors/filters for the sensor;
The meaning for <param1>
, <param2>
is task-dependent.
Also the number of required paraterms is task-dependent.
The result of each task are pushed to an MQTT topic. As an example:
schedule:
- cron: every 10sec
tasks:
- task: cpu_times_percent
params: [ system ]
configures PSMQTT to publish on the MQTT topic psmqtt/COMPUTER_NAME/cpu_times_percent/system
the value of the system
field returned by the psutil cpu_times_percent function.
Most tasks support wildcard *
parameters which will cause the task to produce multiple outputs;
in such case the MQTT topic associated with the task should actually be
an MQTT topic prefix so that each task output will be published on a different topic.
As an example:
schedule:
- cron: every 10sec
tasks:
- task: cpu_times_percent
params: [ "*" ]
topic: "cpu/*"
configures PSMQTT to publish on 10 MQTT topics:
- psmqtt/COMPUTER_NAME/cpu/user the value of the
user
field returned by the psutil cpu_times_percent function. - psmqtt/COMPUTER_NAME/cpu/nice the value of the
nice
field returned by the psutil cpu_times_percent function. - psmqtt/COMPUTER_NAME/cpu/system the value of the
system
field returned by the psutil cpu_times_percent function.
... etc etc ...
Most tasks support also the wildcard +
parameter to get all possible fields of the psutil or pySMART output in one single topic, encoding them as a JSON string; in other words a single MQTT message will be published
on a single MQTT topic with a message payload containing a JSON string.
As an example:
schedule:
- cron: every 10sec
tasks:
- task: cpu_times_percent
params: [ "+" ]
topic: "cpu"
configures PSMQTT to publish on the MQTT topic psmqtt/COMPUTER_NAME/cpu
the JSON encoding of what is returned by the psutil cpu_times_percent function, e.g. {"user": 12.0, "nice": 1.0, "system": 5.0, ...}
.
In case of task execution error, the error message is sent to a topic named
psmqtt/COMPUTER_NAME/error/TASK. Please check some MQTT documentation to understand the role of the /
MQTT
topic level separator.
Here follows the reference documentation for all required tasks and their parameters:
- Task name:
cpu_percent
- Short description: CPU total usage in percentage. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all the CPUs or the CPU index0
,1
,2
, etc to select a single CPU
- Task name:
cpu_times
- Short description: CPU times information. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all fields or a field name likeuser
,nice
,system
, etc.
- Task name:
cpu_times_percent
- Short description: CPU times in percentage. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all fields or a field name likeuser
,nice
,system
, etc. Check full reference for all available fields - OPTIONAL:
<param2>
: The wildcard*
or+
to select all CPUs or the CPU index0
,1
,2
, etc to select a single CPU. Note that you cannot use a wildcard as<param2>
together with a wildcard on<param1>
.
- Task name:
cpu_stats
- Short description: CPU statistics. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all fields or or a field name likectx_switches
,interrupts
,soft_interrupts
,syscalls
.
- Task name:
virtual_memory
- Short description: Virtual memory information. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all fields or one oftotal
,available
,percent
, etc. Check full reference for all available fields
- Task name:
swap_memory
- Short description: Swap memory information. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all fields or one oftotal
,used
,free
, etc. Check full reference for all available fields
- Task name:
disk_partitions
- Short description: List of mounted disk partitions. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all fields or a field name likedevice
,mountpoint
,fstype
,opts
. - OPTIONAL:
<param2>
: The wildcard*
or+
to select all partitions or an index0
,1
,2
, etc to select a specific partition. Note that you cannot use a wildcard as<param2>
together with a wildcard on<param1>
.
- Task name:
disk_usage
- Short description: Disk usage for a particular drive. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all fields or a field name liketotal
,used
,free
,percent
. - REQUIRED:
<param2>
: The name of the drive for which disk usage must be published, e.g./dev/md0
or/dev/sda1
.
- Task name:
disk_io_counters
- Short description: Disk I/O counters. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all fields or a field name likeread_count
,write_count
,read_bytes
,write_bytes
, etc. Check full reference for all available fields - REQUIRED:
<param2>
: The wildcard*
or+
to select all partitions or an index0
,1
,2
, etc to select a specific partition. Note that you cannot use a wildcard as<param2>
together with a wildcard on<param1>
.
- Task name:
smart
- Short description: Self-Monitoring, Analysis and Reporting Technology System (SMART) counters built into most modern ATA/SATA, SCSI/SAS and NVMe disks. Full reference
- REQUIRED:
<param1>
: The name of a specific drive e.g./dev/md0
or/dev/sda
. - REQUIRED:
<param2>
: The wildcard*
or+
to select all S.M.A.R.T. attributes or a field name likeinterface
,is_ssd
,model
,name
,path
,rotation_rate
,serial
,smart_capable
,smart_enabled
,smart_status
,temperature
,test_capabilities
. All SMART attributes are reported in fields namedattribute_raw[ATTRIBUTE_NAME]
. The availability of specific attributes depends on the disk vendor and disk model. E.g. a typical SMART attribute name would bePower_On_Hours
which can be selected using for<param2>
the valueattribute_raw[Power_On_Hours]
. All SMART tests (short self tests, long self tests, etc) are reported in fields namedtest[TEST_INDEX]
with<TEST_INDEX>
being a number0
,1
,2
, etc (depending on how many SMART tests were run on the disk). The value of eachtest[TEST_INDEX]
is a JSON string containing details about that test, e.g.hours
,type
,status
, etc. The tests are sorted byhours
in decreasing order so thattest[0]
always indicates the most recent SMART test results. You can try the following Python snippet on your prompt to see which SMART attributes are detected by pySMART library for e.g. your device/dev/sda
:sudo python3 -c 'import pySMART; pySMART.Device("/dev/sda").all_attributes()'
- Task name:
net_io_counters
- Short description: Network I/O counters. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all fields or a field name likebytes_sent
,bytes_recv
,packets_sent
,packets_recv
, etc. Check full reference for all available fields - OPTIONAL:
<param2>
: The wildcard*
or+
to select all network interface cards (NICs) or a NIC name like e.g.eth0
,wlan0
,enp3s0f0
, etc to select a specific NIC. Note that you cannot use a wildcard as<param2>
together with a wildcard on<param1>
.
- Task name:
sensors_temperatures
- Short description: Hardware temperatures. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all available sensor types (e.g.asus
,coretemp
,amdgpu
, etc). Try the following Python snippet on your prompt to see which temperature sensor types are detected by psutil library:python3 -c 'import psutil, pprint; pprint.pprint(psutil.sensors_temperatures())'
- OPTIONAL:
<param2>
: The wildcard*
or+
to select all temperature sensors of the selected sensor type or alabel
value to select a specific sensor. E.g. you might want to useCore 0
as label to publish only the temperature of the first logical core. - OPTIONAL:
<param3>
: The wildcard*
or+
to select all temperature information available from the selected sensors or field name likecurrent
,high
,critical
to select only a specific information.
- Task name:
sensors_fans
- Short description: Hardware fans speed. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all available sensor types (e.g.asus
, etc). Try the following Python snippet on your prompt to see which fan sensor types are detected by psutil library:python3 -c 'import psutil, pprint; pprint.pprint(psutil.sensors_fans())'
- OPTIONAL:
<param2>
: The wildcard*
or+
to select all fan sensors of the selected sensor type or alabel
value to select a specific sensor. E.g. you might want to usecpu_fan
as label to publish only the fan speed of the CPU. - OPTIONAL:
<param3>
: The wildcard*
or+
to select all information available from the selected sensors or field name likecurrent
to select only a specific information.
- Task name:
sensors_battery
- Short description: Battery status information. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all fields or a field name likepercent
,secsleft
,power_plugged
, etc. Check full reference for all available fields and their meaning.
- Task name:
users
- Short description: Users currently connected on the system. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all fields or a field name likename
,terminal
,host
,started
, etc. Check full reference for all available fields and their meaning. - OPTIONAL:
<param2>
: The wildcard*
or+
to select all users or an index0
,1
,2
, etc to select a specific user.
- Task name:
boot_time
- Short description: System boot time. Full reference
- NO PARAMETES
- Task name:
pids
- Short description: Currently running process IDs. Full reference
- REQUIRED:
<param1>
: The wildcard*
or+
to select all PIDs, thecount
string to return just the number of PIDs or an index0
,1
,2
, etc to select a specific process ID.
- Task name:
processes
- Short description: Single process parameters. Full reference
- REQUIRED:
<param1>
: one of- numeric ID of the process
top_cpu
- top CPU consuming processtop_cpu[N]
- CPU consuming process number Ntop_memory
- top memory consuming processtop_memory[N]
- memory consuming process number Npid[PATH]
- process with ID specified in the file having PATH path (.pid file).name[PATTERN]
- process with name matching PATTERN pattern (use*
to match zero or more characters,?
for single character)*
- to get value of some property for all processes. Topic per process ID+
- to get value of some property for all processes in one topic (JSON string)
- OPTIONAL:
<param2>
: one ofpid
- process IDppid -
parent process IDname
- process nameexe
- process executable filecwd
- process working directorycmdline/*
- command line. Topic per linecmdline/+
- command line in one topic (JSON string)cmdline/count
- number of command line linescmdline/{0/1/etc}
- command line single linestatus
- process status (running/sleeping/idle/dead/etc)username
- user started processcreate_time
- time when process was started (Unix timestamp)terminal
- terminal of the processuids/*
- process user IDs. Topic per parameteruids/+
- process user IDs in one topic (JSON string)uids/{real/effective/saved}
- process user IDs single parametergids/*
- process group IDs. Topic per parametergids/+
- process group IDs in one topic (JSON string)gids/{real/effective/saved}
- process group IDs single parametercpu_times/*
- process CPU times. Topic per parametercpu_times/+
- process CPU times in one topic (JSON string)cpu_times/{user/system/children_user/children_system}
- process CPU times single parametercpu_percent
- CPU percent used by processmemory_percent
- memory percent used by processmemory_info/*
- memory used by process. Topic per parametermemory_info/+
- memory used by process in one topic (JSON string)memory_info/{rss/vms/shared/text/lib/data/dirty/uss/pss/swap}
- memory used by process single parameterio_counters/*
- process I/O counters. Topic per parameterio_counters/+
- process I/O counters in one topic (JSON string)io_counters/{read_count/write_count/read_bytes/write_bytes}
- process I/O single counternum_threads
- number of threadsnum_fds
- number of file descriptorsnum_ctx_switches/*
- number of context switches. Topic per parameternum_ctx_switches/+
- number of context switches in one topic (JSON string)num_ctx_switches/{voluntary/involuntary}
- context switches single counternice
- nice value*
- all process properties. Topic per property+
- all process properties in one topic (JSON string)**
- all process properties and sub-properties. Topic per property**;
- all process properties and sub-properties in one topic (JSON string)
These are 'tasks' I found most relevant and useful for tracking my server(s) health and performance:
Task | Description |
---|---|
boot_time |
Up time |
cpu_percent |
CPU total usage in percent |
sensors_temperatures/coretemp/0/ |
CPU package temperature |
virtual_memory/percent |
Virtual memory used |
virtual_memory/free/{{x|GB}} |
Virtual memory free, GB |
swap_memory/percent |
Swap memory used |
disk_usage/percent/| |
Root drive (forward slash replaced with pipe) usage in percent (Linux) |
disk_usage/free/|/{{x|GB}} |
space left in GB for root drive (Linux) |
smart/nvme0/ |
All SMART attributes for the device 'nvme0' (requires root priviliges) |
smart/nvme0/temperature |
Just the device 'nvme0' temperature (requires root priviliges) |
processes/top_cpu/name |
Name of top process consuming CPU |
processes/top_memory/exe |
Executable file of top process consuming memory |
sensors_fans/dell_smm/0 |
Fan seed |
sensors_battery/percent |
Battery charge |
The output of each task can be formatted using
Jinja2 templates in the formatter
field of task definitions
E.g.:
schedule:
- cron: every 10sec
tasks:
- task: cpu_times_percent
params: [ "user" ]
formatter: "{{x}}%"
configures PSMQTT to append the %
symbol after CPU usage.
For task providing many outputs (using wildcard *
) all outputs are
available by name if they are named.
Unnamed outputs are available as x
.
When the task produces multiple unnamed outputs they are available as x[1]
, x[2]
, etc if they are
numbered.
psmqtt provides some Jinja2 filters:
KB
,MB
,GB
to format value in bytes as KBytes, MBytes or GBytes.uptime
to formatboot_time
as a human friendly uptime string representation.
Examples:
- task: virtual_memory
params: [ "*" ]
# emit free virtual memory in %
formatter: "{{(100*free/total)|int}}%"
- task: virtual_memory
params: [ "free" ]
# emit free virtual memory in MB instead of bytes
formatter: "{{x|MB}}"
- task: cpu_times_percent
params: [ "user", "*" ]
# emit total CPU time spend in user mode for the first and second logical cores only
formatter: "{{x[0]+x[1]}}"
- task: boot_time
formatter: "{{x|uptime}}"
The <MQTT topic>
specification in each task definition is optional.
If it is not specified, psmqtt will generate automatically an output MQTT topic
in the form psmqtt/COMPUTER_NAME/.
To customize the prefix psmqtt/COMPUTER_NAME you can use the mqtt.publish_topic_prefix
key in the configuraton file. E.g.:
mqtt:
publish_topic_prefix: my-prefix
configures psmqtt to emit all outputs at my-prefix/.
It's important to note that when the task emits more than one output due to the use of the
wildcard *
character then the MQTT topic must be specified and must include the
wildcard *
character itself.
As an example the task
schedule:
- cron: every 10sec
tasks:
- task: cpu_times_percent
params: [ "*" ]
topic: "cpu/*"
is producing 10 outputs on a Linux system: one for each of the user
, nice
, system
,
idle
, iowait
, irq
, softirq
, steal
, guest
and guest_nice
fields emitted by psutil.
These 10 outputs must be published on 10 different MQTT topics.
The use of cpu/*
as MQTT topic configures psmqtt to send the 10 outputs to the following 10 topics:
- psmqtt/COMPUTER_NAME/cpu/user
- psmqtt/COMPUTER_NAME/cpu/nice
- psmqtt/COMPUTER_NAME/cpu/system
- psmqtt/COMPUTER_NAME/cpu/idle
- psmqtt/COMPUTER_NAME/cpu/iowait
- psmqtt/COMPUTER_NAME/cpu/irq
- psmqtt/COMPUTER_NAME/cpu/softirq
- psmqtt/COMPUTER_NAME/cpu/steal
- psmqtt/COMPUTER_NAME/cpu/guest
- psmqtt/COMPUTER_NAME/cpu/guest_nice
If the wildcard *
character is used in the task parameters but the MQTT topic is not specified
or does not contain the wildcard *
character itself, then an error will be emitted (check psmqtt logs).
The psmqtt.yaml file supports a configuration named "request_topic":
mqtt:
request_topic: request
This configuration allows you to specify an MQTT topic that will be subscribed by psmqtt and used as input trigger for emitting measurements. This is an alternative way to use psmqtt compared to the use of cron expressions.
E.g. to force psmqtt to run the task:
- task: cpu_times_percent
params: [ "*" ]
topic: "cpu/*"
it's possible to send the YAML string above on the topic psmqtt/COMPUTER_NAME/request; the task will be executed immediately when received and will be interpreted like any other task in the configuration file.
The following psmqtt.yaml
is an example intended to be used as reference for some
syntax rules explained in this document:
logging:
level: WARNING
report_status_period_sec: 10
mqtt:
broker:
host: 192.168.0.3
port: 1883
username: psmqtt
password: psmqtt-s3cr3t-pass
clientid: psmqtt
schedule:
- cron: "every 1 minute"
tasks:
- task: cpu_percent
params: [ total ]
- task: virtual_memory
params: [ percent ]
# cpu temp
- task: sensors_temperatures
params: [ rtk_thermal, 0 ]
# temperatures for 2 HDD in RAID from S.M.A.R.T data
- task: smart
params: [ "/dev/sdc", temperature ]
- task: smart
params: [ "/dev/sdd", temperature ]
- cron: "every 1 hour"
tasks:
- task: disk_usage
params: [ percent, "/mnt/md0" ]
- cron: "every 3 hours"
tasks:
- task: boot_time
formatter: "{{x|uptime}}"