Exports the state of YubiHSM2 devices in a Prometheus compatible format. It has two optional tests (audit log retrieval, asymmetric cryptography). The exporter's Helm chart offers three PrometheusRules, which ensure a working YubiHSM2 and exporter based on the metrics.
Beside generic metrics, which are automatically generated by the used Prometheus client library, the exporter generates following:
- A timeseries yubihsm_device_info with a constant value of 1.0. It contains the informative labels version and serial for the corresponding YubiHSM.
- yubihsm_log_size reflects the maximal size of the YubiHSM's audit log (It should be constant.).
- yubihsm_used_log_entries shows, how full the audit log was at time of scraping.
- yubihsm_test_connections_total counts, how often the exporter connected to the YubiHSM.
- yubihsm_test_errors_total counts how often the exporter failed to test a
YubiHSM. The samples have a special label error, which gives more details:
- The value connection hints that communication with the YubiHSM device failed completely.
- A value of get_logs indicates, that the exporter failed to retrieve the audit log from the YubiHSM device.
- A value of crypto_test indicates, that the cryptographic test failed.
All of previously described metrics have to labels, which indicate to which YubiHSM a sample belongs:
- url is the URL, which the exporter used to connect to the YubiHSM.
- name is a (optional) configurable name for the YubiHSM.
For this test the Exporter consumes the audit log of a YubiHSM2 device. It then prints the result to stdout and empties the audit log storage on the YubiHSM2 device. The test requires an authentication key on the YubiHSM2 device, which has the get-log-entries capability.
Note: A single exporter is not aware of other running exporters! Thus multiple exporter instances will compete for the logs on the YubiHSM devices and only one will get the logs.
In this test the exporter maintains a global secret, which is either encrypted or decrypted. If the state is decrypted the exporter encrypts the secret with a public / private key pair on the YubiHSM. Otherwise it let's decrypt the YubiHSM device the encrypted secret and checks, if the decrypted secret is as expected. The exporter refers the key pair on the YubiHSM by a key label.
Note: In case of a multi YubiHSM device the test decrypts the secret with another YubiHSM than the one used while encryption. Thus the test expects all YubiHSMs to store the same (test) key material!
The test needs an authentication key on the YubiHSM device with the capability decrypt-pkcs.
If the deployed exporter shall do the YubiHSM tests, the Pins for the authentication keys have to be in Vault. Different authentication keys for both tests are supported. The vault storing a pin must be accessible by the namespace, no further restrictions.
Here is an example to deploy the exporter on a cluster:
authenticationKeys:
audit:
id: 6
pinVault:
path: app/yubihsm-exporter-test/keys
field: audit
version: 3
application:
id: 3
pinVault:
path: app/yubihsm-exporter-test/keys
field: application
version: 3
encryptionKeyLabel: vault-hsm-key
yubihsmConnectors:
- name: stateful-0001-hsm
url: http://10.5.32.11:9010
- name: stateful-0002-hsm
url: http://10.5.32.12:9010
- name: stateful-0003-hsm
url: http://10.5.32.13:9010
extraLabels:
some: label
serviceMonitor:
enabled: true
prometheusRules:
enabled: true
- authenticationKeys.audit specifies the authentication key for the audit
log retrieval, while authenticationKeys.application specifies the
authentication key for the cryptographic test:
- id is the identifier of the authentication key on the YubiHSM device.
- pinVault refers the Vault (field), which contains the passphrase of the authentication key.
- If the key is not specified, the related test is disabled.
- encryptionKeyLabel is the label of the asymmetric key (pair) on the YubiHSM to be used for the cryptographic test.
- yubihsmConnectors list the YubiHSM Connectors / devices to be scraped / tested:
- name is an optional, user specified name for the device.
- url is the endpoint of the Connector.
- All of the chart's resources have the labels extraLabels, including the Prometheus Operator resources. If a tenant's Prometheus shall scrape the exporter set the tenant label correctly here.
- serviceMonitor.enabled and prometheusRules.enabled control, which monitoring resources are generated.
The chart contains following three Prometheus rules / alerts:
- The first fires, if no exporter instance runs for more than 5 minutes. For this the rule checks the absence of up samples by any exporter instance with the same release name.
- The next check fires, if the failed test rate for a certain HSM was higher than 20% in the last 5 minutes. For this the rule puts the error rate in relation to the number of connections to the device in the same time.
- Finally a third rule ensures for all HSMs that over the last 5 minutes at least one connection cycle was done per minute. This gets triggered, if an YubiHSM gets unresponsive. In such case the failure rate of another misbeheaving YubiHSM might drop to 0% falsely.
The sources are currently in a single python file. There is a Dockerfile, which embedds the tests as build stage test.