This chapter will cover different ways to monitor a Docker container.
docker container stats
command displays a live stream of container(s) runtime metrics. The command supports CPU, memory usage, memory limit, and network IO metrics.
-
Start a container:
docker container run --name web -d jboss/wildfly:10.1.0.Final
-
Check the container stats using
docker container stats web
. It shows the output as:CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS web 0.14% 274.9MiB / 1.952GiB 13.75% 828B / 0B 96.7MB / 4.1kB 53
The output is continually updated. It shows:
-
Container name
-
Percent CPU utilization
-
Total memory usage vs amount available to the container
-
Percent memory utilization
-
Network activity
-
Disk activity
-
PIDS??
-
-
Check stats for multiple containers
-
Terminate single instance of the container
docker container stop web docker container rm web
-
Create a service with multiple replicas of the container
docker swarm init docker service create --name=web jboss/wildfly
-
Now check the stats again:
docker stats
to see the output:
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 1f9d5a1c08a8 0.24% 294.9MiB / 1.952GiB 14.75% 828B / 0B 96.5MB / 4.1kB 115
Note that the container id is shown instead of container’s name in this case.
-
-
Scale the service to 3 replicas:
docker service scale web=3
-
The stats are now shown for all three containers:
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 1f9d5a1c08a8 0.14% 287.8MiB / 1.952GiB 14.40% 1.03kB / 0B 96.5MB / 4.1kB 53 a3395efbb6ff 0.77% 263.4MiB / 1.952GiB 13.18% 578B / 0B 0B / 4.1kB 114 f484ff2f4c12 0.06% 294.8MiB / 1.952GiB 14.75% 578B / 0B 0B / 4.1kB 114
-
Display only container id and percent CPU utilization using the command
docker container stats --format "{{.Container}}: {{.CPUPerc}}"
:55198043b6aa: 0.10% 5b5dd33b675d: 0.11% 6e98a9597e6a: 0.10%
-
Format the output in a table. The results should include container name, percent CPU utilization and percent memory utilization. This can be achieved using the command:
docker container stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
It shows the output:
NAME CPU % MEM USAGE / LIMIT web.2.1ic0vevvvu2nwwyc6css58ref 0.10% 267.5MiB / 1.952GiB web.3.pwcgr58s1xo28gwa1znlrn2s3 0.11% 274.7MiB / 1.952GiB web.1.bg1dcwagl9tzz0azuegxzoys8 0.12% 274MiB / 1.952GiB
-
Display only the first result using the command
docker container stats --no-stream
:CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 55198043b6aa 0.12% 267.5MiB / 1.952GiB 13.39% 1.44kB / 0B 0B / 4.1kB 51 5b5dd33b675d 0.07% 274.7MiB / 1.952GiB 13.75% 1.51kB / 0B 0B / 4.1kB 51 6e98a9597e6a 0.26% 274.1MiB / 1.952GiB 13.71% 1.69kB / 0B 0B / 4.1kB 51
Docker Remote API provides a lot more details about the health of the container. It can be invoked using the following format:
curl --unix-socket /var/run/docker.sock http://localhost/containers/<name>/stats
On Docker for Mac, enabling remote HTTP API still requires a few steps. So this command uses the --unix-socket
option to invoke the Remote API.
A specific invocation for a container can be done as:
curl --unix-socket /var/run/docker.sock http://localhost/containers/web.1.bg1dcwagl9tzz0azuegxzoys8/stats
Note, the container name is from the previous run of the service. It will show the output:
{"read":"2017-10-13T14:08:38.045217731Z","preread":"0001-01-01T00:00:00Z","pids_stats":{"current":51},"blkio_stats":{"io_service_bytes_recursive":[{"major":8,"minor":0,"op":"Read","value":0},{"major":8,"minor":0,"op":"Write","value":4096},{"major":8,"minor":0,"op":"Sync","value":0},{"major":8,"minor":0,"op":"Async","value":4096},{"major":8,"minor":0,"op":"Total","value":4096}],"io_serviced_recursive":[{"major":8,"minor":0,"op":"Read","value":0},{"major":8,"minor":0,"op":"Write","value":1},{"major":8,"minor":0,"op":"Sync","value":0},{"major":8,"minor":0,"op":"Async","value":1},{"major":8,"minor":0,"op":"Total","value":1}],"io_queue_recursive":[],"io_service_time_recursive":[],"io_wait_time_recursive":[],"io_merged_recursive":[],"io_time_recursive":[],"sectors_recursive":[]},"num_procs":0,"storage_stats":{},"cpu_stats":{"cpu_usage":{"total_usage":11130296115,"percpu_usage":[2687118654,3014514615,2971860160,2456802686],"usage_in_kernelmode":2700000000,"usage_in_usermode":7630000000},"system_cpu_usage":952826800000000,"online_cpus":4,"throttling_data":{"periods":0,"throttled_periods":0,"throttled_time":0}},"precpu_stats":{"cpu_usage":{"total_usage":0,"usage_in_kernelmode":0,"usage_in_usermode":0},"throttling_data":{"periods":0,"throttled_periods":0,"throttled_time":0}},"memory_stats":{"usage":288051200,"max_usage":297189376,"stats":{"active_anon":283893760,"active_file":0,"cache":135168,"dirty":16384,"hierarchical_memory_limit":9223372036854771712,"hierarchical_memsw_limit":9223372036854771712,"inactive_anon":0,"inactive_file":135168,"mapped_file":32768,"pgfault":83204,"pgmajfault":0,"pgpgin":78441,"pgpgout":9093,"rss":283914240,"rss_huge":0,"swap":0,"total_active_anon":283893760,"total_active_file":0,"total_cache":135168,"total_dirty":16384,"total_inactive_anon":0,"total_inactive_file":135168,"total_mapped_file":32768,"total_pgfault":83204,"total_pgmajfault":0,"total_pgpgin":78441,"total_pgpgout":9093,"total_rss":283914240,"total_rss_huge":0,"total_swap":0,"total_unevictable":0,"total_writeback":0,"unevictable":0,"writeback":0},"limit":2095874048},"name":"/web.1.bg1dcwagl9tzz0azuegxzoys8","id":"6e98a9597e6af085e73a4d211fff9a164aa012727a46525d4fbaa164b572e23f","networks":{"eth0":{"rx_bytes":1882,"rx_packets":37,"rx_errors":0,"rx_dropped":0,"tx_bytes":0,"tx_packets":0,"tx_errors":0,"tx_dropped":0}}}
As you can see, far more details about container’s health are shown here. These stats are refereshed every one second. The continuous refresh of metrics can be terminated using Ctrl + C
.
docker system events
provide real time events for the Docker host.
-
In one terminal (T1), type
docker system events
. The command does not show output and waits for any event worth reporting to occur. The list of events is listed at https://docs.docker.com/engine/reference/commandline/events/#/extended-description. -
In a new terminal (T2), kill existing container using
docker service scale web=2
. -
T1 shows the updated list of events as:
2017-10-13T07:12:00.223791013-07:00 service update r4i0x8ujnn2q8osj8dowgvw72 (name=web, replicas.new=2, replicas.old=3) 2017-10-13T07:12:00.332724880-07:00 container kill 5b5dd33b675d3b6be3e6aaf0ecde928b3ac882b0a221ff71e57c86faae8181ab (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=pwcgr58s1xo28gwa1znlrn2s3, com.docker.swarm.task.name=web.3.pwcgr58s1xo28gwa1znlrn2s3, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.pwcgr58s1xo28gwa1znlrn2s3, signal=15, vendor=CentOS) 2017-10-13T07:12:00.613143701-07:00 container die 5b5dd33b675d3b6be3e6aaf0ecde928b3ac882b0a221ff71e57c86faae8181ab (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=pwcgr58s1xo28gwa1znlrn2s3, com.docker.swarm.task.name=web.3.pwcgr58s1xo28gwa1znlrn2s3, exitCode=0, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.pwcgr58s1xo28gwa1znlrn2s3, vendor=CentOS) 2017-10-13T07:12:00.897831488-07:00 network disconnect 8f8e6ce771d6db6065f2472a7e83612ff6a657de3b6d08dab0617b8a596234fa (container=5b5dd33b675d3b6be3e6aaf0ecde928b3ac882b0a221ff71e57c86faae8181ab, name=bridge, type=bridge) 2017-10-13T07:12:01.017523717-07:00 container stop 5b5dd33b675d3b6be3e6aaf0ecde928b3ac882b0a221ff71e57c86faae8181ab (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=pwcgr58s1xo28gwa1znlrn2s3, com.docker.swarm.task.name=web.3.pwcgr58s1xo28gwa1znlrn2s3, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.pwcgr58s1xo28gwa1znlrn2s3, vendor=CentOS) 2017-10-13T07:12:01.023414108-07:00 container destroy 5b5dd33b675d3b6be3e6aaf0ecde928b3ac882b0a221ff71e57c86faae8181ab (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=pwcgr58s1xo28gwa1znlrn2s3, com.docker.swarm.task.name=web.3.pwcgr58s1xo28gwa1znlrn2s3, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.pwcgr58s1xo28gwa1znlrn2s3, vendor=CentOS)
The output shows a list of events, one in each line. The events shown here are
container kill
,container die
,network disconnect
,container stop
, andcontainer destroy
. Date and timestamp for each event is displayed at the beginning of the line. Other event specific information is displayed as well. -
In T2, scale the service back to 3 replicas:
docker service scale web=3
-
The output in T1 is updated to show:
2017-10-13T07:13:47.161848609-07:00 service update r4i0x8ujnn2q8osj8dowgvw72 (name=web, replicas.new=3, replicas.old=2) 2017-10-13T07:13:47.429074382-07:00 container create 0574d1fd74bef2e6fc54174e1fbeda25efd7ed270dce1d6dbede4ead19c7c485 (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=xcmylcwlag5vot4tp3l5z6oam, com.docker.swarm.task.name=web.3.xcmylcwlag5vot4tp3l5z6oam, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.xcmylcwlag5vot4tp3l5z6oam, vendor=CentOS) 2017-10-13T07:13:47.445010259-07:00 network connect 8f8e6ce771d6db6065f2472a7e83612ff6a657de3b6d08dab0617b8a596234fa (container=0574d1fd74bef2e6fc54174e1fbeda25efd7ed270dce1d6dbede4ead19c7c485, name=bridge, type=bridge) 2017-10-13T07:13:47.778855117-07:00 container start 0574d1fd74bef2e6fc54174e1fbeda25efd7ed270dce1d6dbede4ead19c7c485 (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=xcmylcwlag5vot4tp3l5z6oam, com.docker.swarm.task.name=web.3.xcmylcwlag5vot4tp3l5z6oam, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.xcmylcwlag5vot4tp3l5z6oam, vendor=CentOS)
The list of events shown here are
container create
,network connect
, andcontainer start
.
The list of events can be restricted by filters specified using --filter
or -f
option. The currently supported filters are:
-
container (
container=<name or id>
) -
daemon (
daemon=<name or id>
) -
event (
event=<event action>
) -
image (
image=<tag or id>
) -
label (
label=<key>
orlabel=<key>=<value>
) -
network (
network=<name or id>
) -
plugin (
plugin=<name or id>
) -
type (
type=<container or image or volume or network or daemon>
) -
volume (
volume=<name or id>
)
Let’s look at the list of running containers first using docker container ls
, and then learn how to apply these filters.
Here is the list of running containers from the service:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
074447f26452 jboss/wildfly:latest "/opt/jboss/wildfl..." 3 minutes ago Up 3 minutes 8080/tcp web.1.ytyv0gqi7dzxtetssrlsgvvbu
0574d1fd74be jboss/wildfly:latest "/opt/jboss/wildfl..." 8 minutes ago Up 8 minutes 8080/tcp web.3.xcmylcwlag5vot4tp3l5z6oam
55198043b6aa jboss/wildfly:latest "/opt/jboss/wildfl..." 25 minutes ago Up 25 minutes 8080/tcp web.2.1ic0vevvvu2nwwyc6css58ref
Let’s apply the filters.
-
Show events for a container by name
-
In T1, give the command to listen to a specific container as:
docker system events -f container=web.1.ytyv0gqi7dzxtetssrlsgvvbu
You may have to terminate previous run of
docker system events
usingCtrl
+C
to give this new command. -
In T2, terminate the second replica of the service as
docker container rm -f web.2.1ic0vevvvu2nwwyc6css58ref
. -
T1 does not show any events because its only listening for events from the first replica of the service.
-
-
Show events for an event
-
In T1, give the command
docker system events -f event=create
. -
In T2, scale the service by one more replica:
docker service scale web=4
-
T1 shows the event for container creation
2017-10-13T07:24:22.971050949-07:00 container create 84e4604ffd983cfcc53ad619b4c11156518834fe23e4a0a8b299905b978a0022 (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=38unfmcsxmnvr844gysn28lwa, com.docker.swarm.task.name=web.4.38unfmcsxmnvr844gysn28lwa, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.4.38unfmcsxmnvr844gysn28lwa, vendor=CentOS)
This is accurate as a new container is created and the event is shown in T1 console.
-
In T2, scale the service back to 2 using the command
docker servie scale web=2
-
T1 does not show any additional events because its only looking for create events
-
More samples are explained at https://docs.docker.com/engine/reference/commandline/events/#/filter-events-by-criteria.
-
Prometheus is an open-source systems monitoring and alerting toolkit. Prometheus collects metrics from monitored targets by scraping metrics from HTTP endpoints on these targets. Docker instance can be configured as Prometheus target.
Different targets to scrape are defined in the Prometheus configuration file. Targets may be statically configured via the static_configs
parameter in the configuration fle or dynamically discovered using one of the supported service-discovery mechanisms (Consul, DNS, Etcd, etc.).
Prometheus collects metrics from monitored targets by scraping metrics from HTTP endpoints on these targets. Since Prometheus also exposes data in the same manner about itself, it can also scrape and monitor its own health.
Docker exposes Prometheus-compatible metrics on port 9323
. This support is only available as an experimental feature.
-
For Docker for Mac, click on Docker icon in the status menu
-
Select
Preferences…
,Daemon
,Advanced
tab -
Update daemon settings:
{ "metrics-addr" : "0.0.0.0:9323", "experimental" : true }
-
Click on
Apply & Restart
to restart the daemon -
Show the complete list of metrics using
curl http://localhost:9323/metrics
-
Show the list of engine metrics using
curl http://localhost:9323/metrics | grep engine
In this section, we’ll start Prometheus and use it to scrape it’s own health.
-
Create a new directory
prometheus
and change to that directory -
Create a text file
prometheus.yml
and use the following content# A scrape configuration scraping a Node Exporter and the Prometheus server # itself. scrape_configs: # Scrape Prometheus itself every 5 seconds. - job_name: 'prometheus' scrape_interval: 5s static_configs: - targets: ['localhost:9090']
This configuration file scrapes data from the Prometheus container which will be started subsequently on port 9090.
-
Start a single-replica Prometheus service:
docker service create \ --replicas 1 \ --name metrics \ --mount type=bind,source=`pwd`/prometheus.yml,destination=/etc/prometheus/prometheus.yml \ --publish 9090:9090/tcp \ prom/prometheus
This will start the Prometheus container on port 9090.
-
Prometheus dashboard is at http://localhost:9090. Check the list of enabled targets at http://localhost:9090/targets (also accessible from
Status
→Targets
menu).It shows that the Prometheus endpoint is available for scraping.
-
Click on
Graph
and click on-insert metric at cursor-
to see the list of metrics available:These are all the metrics published by the Prometheus endpoint.
-
Choose
http_request_total
metrics, click onExecute
-
Switch from
Console
toGraph
-
Change the duration from
1h
to5m
-
Click on
Add Graph
, select a different metric, sayhttp_requests_duration_microseconds
, and click onExecute
-
Switch from
Console
toGraph
and change the duration from1h
to5m
-
Stop the container:
docker container rm -f metrics
Multiple graphs can be added this way.
In this section, we’ll start Prometheus node exporter that will publish machine metrics. Then we’ll use Prometheus to scrape its health information about the node running Docker.
-
All containers need to use the same overlay network so that they can communicate with each other. Let’s create an overlay network:
docker network create --driver overlay prom
-
Start Prometheus node exporter:
docker service create --name node \ --mode global \ --mount type=bind,source=/proc,target=/host/proc \ --mount type=bind,source=/sys,target=/host/sys \ --mount type=bind,source=/,target=/rootfs \ --network prom \ --publish 9100:9100 \ prom/node-exporter:v0.15.0 \ --path.procfs /host/proc \ --path.sysfs /host/sys \ --collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"
A few observations in this command:
-
This is started as a global service such that it is started on all nodes of the cluster.
-
As explained in prometheus/node_exporter#610, node exporter only works with host network on Mac OSX. This is not needed if you are running on Linux.
-
It uses the overlay network previously created.
-
It needs access to host’s filesystems such that the metrics about the node can be published.
-
-
Update
prometheus.yml
to the following text:global: scrape_interval: 10s scrape_configs: - job_name: 'prometheus' static_configs: - targets: - 'localhost:9090' - job_name: 'node resources' dns_sd_configs: - names: ['tasks.node'] type: 'A' port: 9100 params: collect[]: - cpu - meminfo - diskstats - netdev - netstat - job_name: 'node storage' scrape_interval: 1m dns_sd_configs: - names: ['tasks.node'] type: 'A' port: 9100 params: collect[]: - filefd - filesystem - xfs
A few observations:
-
DNS-based service discovery is used to discover the scraper for node-exporter. This is further explained at dns_sd_configs. A record-based queries are used to discover the service.
-
Two different jobs are created even though they are scraping from the same endpoint. This provides a more logical way to represent data.
-
-
Terminate previously running Prometheus service:
docker service rm metrics
-
Restart the Prometheus service, this time using the overlay network, as:
docker service create \ --replicas 1 \ --name metrics \ --network prom \ --mount type=bind,source=`pwd`/prometheus.yml,destination=/etc/prometheus/prometheus.yml \ --publish 9090:9090/tcp \ prom/prometheus
-
Confirm that both the services have started:
ID NAME MODE REPLICAS IMAGE PORTS lzl41s2i66jd metrics replicated 1/1 prom/prometheus:latest *:9090->9090/tcp dro3ncpyuchp node global 1/1 prom/node-exporter:latest
-
Confirm that all the targets are configured correctly at Prometheus dashboard:
-
Now a lot more metrics, this time from the node, are also available:
Console output and graphs for all these metrics is now available:
Complete list of metrics is available at https://github.com/prometheus/node_exporter.
Add some fun queries from https://prometheus.io/docs/querying/basics/.
cAdvisor (Container Advisor) provides resource usage and performance characteristics running containers. Let’s take a look on how cAdvisor can be used to get these metrics from containers.
-
Run
cAdvisor
docker container run \ --volume=/:/rootfs:ro \ --volume=/var/run:/var/run:rw \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:ro \ --publish=8080:8080 \ --detach=true \ --name=cadvisor \ google/cadvisor:latest
-
Dashboard is available at http://localhost:8080
-
A high-level CPU and Memory utilization is shown. More details about CPU, memory, network and filesystem usage is shown in the same page. CPU usage looks like as shown:
-
All Docker containers are in
/docker
sub-container.Click on any of the containers and see more details about the container.
cAdvisor samples once a second and has historical data for only one minute. The data generated from cAdvisor can be exported to InfluxDB. Optionally, you may use a Grafana front end to visualize the data as explained in How to setup Docker monitoring.
cAdvisor also exposes container statistics as Prometheus metrics out of the box. By default, these metrics are served under the /metrics
HTTP endpoint. Let’s take a look at how these container metrics can be observed using Prometheus.
-
Terminate previously running cAdvisor:
docker container rm -f cadvisor
-
Start a new cAdvisor service, using the
prom
overlay network created earlier:docker service create \ --name cadvisor \ --network prom \ --mode global \ --mount type=bind,source=/,target=/rootfs \ --mount type=bind,source=/var/run,target=/var/run \ --mount type=bind,source=/sys,target=/sys \ --mount type=bind,source=/var/lib/docker,target=/var/lib/docker \ google/cadvisor:latest
-
Terminate the previously running Prometheus service:
docker service rm metrics
-
The update
prometheus.yml
configuration file is:global: scrape_interval: 10s scrape_configs: - job_name: 'prometheus' static_configs: - targets: - 'localhost:9090' - job_name: 'node resources' dns_sd_configs: - names: ['tasks.node'] type: 'A' port: 9100 params: collect[]: - cpu - meminfo - diskstats - netdev - netstat - job_name: 'node storage' scrape_interval: 1m dns_sd_configs: - names: ['tasks.node'] type: 'A' port: 9100 params: collect[]: - filefd - filesystem - xfs - job_name: 'cadvisor' dns_sd_configs: - names: ['tasks.cadvisor'] type: 'A' port: 8080
-
Start the new Prometheus service
docker service create \ --replicas 1 \ --name metrics \ --network prom \ --mount type=bind,source=`pwd`/prometheus.yml,destination=/etc/prometheus/prometheus.yml \ --publish 9090:9090/tcp \ prom/prometheus
-
Confirm that all the targets are configured correctly at Prometheus dashboard:
Note, all four scrape endpoints are shown here.
-
In Graphs, now, a lot more metrics, this time from cAdvisor, are also available:
Console output and graphs for all these metrics is now available:
Complete list of metrics is available at https://github.com/google/cadvisor.
Here is a basic query written using PromQL worth trying:
sum by (container_label_com_docker_swarm_node_id) (
irate(
container_cpu_usage_seconds_total{
container_label_com_docker_swarm_service_name="metrics"
}[1m]
)
)
This shows the average amount of CPU used per minute by the service metrics
aggregated over multiple CPUs. The graph will look as shown:
This section will explain how an existing Java application can be updated to publish metrics and monitored by Prometheus.
Prometheus collects metrics from monitored targets by scraping metrics HTTP endpoints on these targets.
As discussed earlier, Prometheus collects metrics from monitored targets by scraping from an HTTP endpoint on these targets. By default, these metrics are expected to be published at /metrics
. Any existing Java application can be updated to publish Prometheus-style metrics at this endpoint.
An earlier chapter explained a simple Java EE application that talks to a MySQL database. This application also publishes Prometheus-style metrics for the underlying JVM at /metrics
. It also publishes application-specific metrics such as total number of times GET /
and GET /{id}
is called.
The complete set of JVM metrics are explained at https://github.com/prometheus/client_java. Refer to https://github.com/arun-gupta/docker-javaee/tree/master/employees/src/main/java/org/javaee/samples/employees/metrics for more details on how these metrics are enabled.
-
Use the Compose file to deploy a simple the Java EE application. This will start WildFly Swarm application and MySQL database.
docker stack deploy --compose-file=docker-compose.yml webapp
This will create
webapp_default
overlay network, and start thewebapp_web
andwebapp_db
services. -
Verify the network:
$ docker network ls NETWORK ID NAME DRIVER SCOPE u6ybdaqx5h5y webapp_default overlay swarm
Other networks may be shown here as well.
-
Verify the services:
$ docker service ls ID NAME MODE REPLICAS IMAGE PORTS ucztcpf1vw0a webapp_db replicated 1/1 mysql:8 *:3306->3306/tcp jttfgvr09kre webapp_web replicated 1/1 arungupta/docker-javaee:latest *:8080->8080/tcp,*:9990->9990/tcp
-
Verify that the endpoint is accessible:
$ curl http://localhost:8080/resources/employees <?xml version="1.0" encoding="UTF-8" standalone="yes"?><collection><employee><id>1</id><name>Penny</name></employee><employee><id>2</id><name>Sheldon</name></employee><employee><id>3</id><name>Amy</name></employee><employee><id>4</id><name>Leonard</name></employee><employee><id>5</id><name>Bernadette</name></employee><employee><id>6</id><name>Raj</name></employee><employee><id>7</id><name>Howard</name></employee><employee><id>8</id><name>Priya</name></employee></collection>
-
Access the metrics published by the endpoint using
curl http://localhost:8080/metrics
to see the output:# HELP jvm_info JVM version info # TYPE jvm_info gauge jvm_info{version="1.8.0_141-8u141-b15-1~deb9u1-b15",vendor="Oracle Corporation",} 1.0 # HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds. # TYPE jvm_gc_collection_seconds summary jvm_gc_collection_seconds_count{gc="PS Scavenge",} 25.0 jvm_gc_collection_seconds_sum{gc="PS Scavenge",} 0.386 jvm_gc_collection_seconds_count{gc="PS MarkSweep",} 6.0 jvm_gc_collection_seconds_sum{gc="PS MarkSweep",} 0.546 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 25.5 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.508056592419E9 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 499.0 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1048576.0 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 4.244393984E9 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 5.06601472E8 # HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM # TYPE jvm_classes_loaded gauge jvm_classes_loaded 13096.0 # HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution # TYPE jvm_classes_loaded_total counter jvm_classes_loaded_total 13096.0 # HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution # TYPE jvm_classes_unloaded_total counter jvm_classes_unloaded_total 0.0 # HELP jvm_threads_current Current thread count of a JVM # TYPE jvm_threads_current gauge jvm_threads_current 60.0 # HELP jvm_threads_daemon Daemon thread count of a JVM # TYPE jvm_threads_daemon gauge jvm_threads_daemon 12.0 # HELP jvm_threads_peak Peak thread count of a JVM # TYPE jvm_threads_peak gauge jvm_threads_peak 67.0 # HELP jvm_threads_started_total Started thread count of a JVM # TYPE jvm_threads_started_total counter jvm_threads_started_total 93.0 # HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers # TYPE jvm_threads_deadlocked gauge jvm_threads_deadlocked 0.0 # HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors # TYPE jvm_threads_deadlocked_monitor gauge jvm_threads_deadlocked_monitor 0.0 # HELP jvm_memory_bytes_used Used bytes of a given JVM memory area. # TYPE jvm_memory_bytes_used gauge jvm_memory_bytes_used{area="heap",} 1.2072508E8 jvm_memory_bytes_used{area="nonheap",} 9.3550048E7 # HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area. # TYPE jvm_memory_bytes_committed gauge jvm_memory_bytes_committed{area="heap",} 2.69484032E8 jvm_memory_bytes_committed{area="nonheap",} 1.0133504E8 # HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area. # TYPE jvm_memory_bytes_max gauge jvm_memory_bytes_max{area="heap",} 4.66092032E8 jvm_memory_bytes_max{area="nonheap",} -1.0 # HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool. # TYPE jvm_memory_pool_bytes_used gauge jvm_memory_pool_bytes_used{pool="Code Cache",} 1.4589888E7 jvm_memory_pool_bytes_used{pool="Metaspace",} 6.9998048E7 jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 8962112.0 jvm_memory_pool_bytes_used{pool="PS Eden Space",} 2.3732032E7 jvm_memory_pool_bytes_used{pool="PS Survivor Space",} 6073592.0 jvm_memory_pool_bytes_used{pool="PS Old Gen",} 9.0919456E7 # HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool. # TYPE jvm_memory_pool_bytes_committed gauge jvm_memory_pool_bytes_committed{pool="Code Cache",} 1.47456E7 jvm_memory_pool_bytes_committed{pool="Metaspace",} 7.5800576E7 jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 1.0788864E7 jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 9.2274688E7 jvm_memory_pool_bytes_committed{pool="PS Survivor Space",} 3.8797312E7 jvm_memory_pool_bytes_committed{pool="PS Old Gen",} 1.38412032E8 # HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool. # TYPE jvm_memory_pool_bytes_max gauge jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8 jvm_memory_pool_bytes_max{pool="Metaspace",} -1.0 jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 1.073741824E9 jvm_memory_pool_bytes_max{pool="PS Eden Space",} 9.699328E7 jvm_memory_pool_bytes_max{pool="PS Survivor Space",} 3.8797312E7 jvm_memory_pool_bytes_max{pool="PS Old Gen",} 3.49700096E8
It shows all the JVM metrics that are published by the Prometheus JVM Client. The metrics generated by the application are not shown yet. It requires for the application to be accessed first.
Let’s access the JVM metrics in Prometheus dashboard first, and then we’ll access the app to show app-specific metrics.
-
Make sure to terminate any previously running Prometheus endpoints:
docker service rm metrics
-
Create a directory
prometheus
and change into that directory. -
Create a text file
prometheus.yml
and add the following content:global: scrape_interval: 10s scrape_configs: - job_name: 'webapp' dns_sd_configs: - names: ['tasks.webapp_web'] type: 'A' port: 8080
This defines the configuration for the HTTP endpoint that publishes Prometheus-style metrics from the Java application.
-
Start Prometheus service:
docker service create \ --replicas 1 \ --network webapp_default \ --name metrics \ --mount type=bind,source=`pwd`/prometheus.yml,destination=/etc/prometheus/prometheus.yml \ --publish 9090:9090 \ prom/prometheus
Note, this service is using the
webapp_default
overlay network that is created when the application stack was deployed. -
Access Prometheus dashboard at http://localhost:9090
-
Check the configured targets at http://localhost:9090/targets:
It shows that the application metrics HTTP endpoint is configured as a Prometheus target.
-
On Prometheus dashboard, click on
-insert metric at cursor-
to see the list of metrics available:JVM metrics shown earlier are displayed here as well.
-
Select
jvm_memory_pool_bytes_used
metric and click onExecute
to view the metric. -
Select
Graph
to view the graphical representation -
Now access the application using
curl http://localhost:8080/resources/employees
a few times. -
Refresh Prometheus dashboard and see the updated list of metrics:
Note,
app*
andrequests*
that are generated by the application. -
Select
requests_get_all
metric and view the graph: -
Access the application a few times using
curl http://localhost:8080/resources/employees/5
and then watch therequests_get_one
metric.
Grafana is an open source metric analytics & visualization suite. It supports many different storage backends, called as Data Source. Prometheus can be added as Grafana data source. It even provides support for runnning Prometheus queries from the Grafana dashboard as well. More details can be found in Using Prometheus in Grafana.
This section will explain how to start Grafana, use Prometheus as the data source, and view some container metrics.
-
Start Grafana:
docker container run \ -d \ -p 3000:3000 \ --name=grafana \ -e "GF_SECURITY_ADMIN_PASSWORD=secret" \ grafana/grafana
Use the login name
admin
and passwordsecret
.Read more details about different configuration options.
-
Access Grafana dashboard at http://localhost:3000. Use the login and password as credentials to see Grafana console.
-
Click on
Create your first dashboard
, save it and give it a name, sayDocker and Java dashboard
-
Click on
Graph
, edit, under theMetrics
tab, select your Prometheus data source. -
Enter the following Prometheus query expressions in the query field. The graphs will referesh in a few seconds and will look like as shown: