Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial stab at facilitating /proc/stat sampling #755

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions procstat/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
procstatenv/
dist/
__pycache__/
build/
141 changes: 141 additions & 0 deletions procstat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Proc stat sampler

## Run tests

The following command runs a couple of unit tests for the proc stat sampling system.

```bash
python3 -m unittest discover -v tests/
```

## Using the sampler

```
./main.py --help
usage: main.py [-h] [--sample-frequency SAMPLE_FREQUENCY]
[--track-proc-name [TRACK_PROC_NAME [TRACK_PROC_NAME ...]]]
[--dump-path DUMP_PATH]

Proc stat sampler CLI

optional arguments:
-h, --help show this help message and exit
--sample-frequency SAMPLE_FREQUENCY
Number of samples to obtain per second. Defaults to 1
per second.
--track-proc-name [TRACK_PROC_NAME [TRACK_PROC_NAME ...]]
Process name(s) to track, if any. Multiple allowed.
--dump-path DUMP_PATH
Path where the result will be written.
```

# Transform a dump from the sampler to yaml

```
oschaaf@burst:~/code/istio/tools/procstat$ ./dump-to-yaml.py --help
usage: dump-to-yaml.py [-h] [--dump-path DUMP_PATH]

Transforms dumps from the sampler to yaml

optional arguments:
-h, --help show this help message and exit
--dump-path DUMP_PATH
Path where the target dump resides.
```

### Sample output:

```
- cpu_percent: 2.4
cpu_times:
guest: 0.0
guest_nice: 0.0
idle: 8788185.5
iowait: 2188.63
irq: 0.0
nice: 19.19
softirq: 14.13
steal: 0.0
system: 765.38
user: 5233.24
processes: []
timestamp: 1581979800.9612823
- cpu_percent: 0.0
cpu_times:
guest: 0.0
guest_nice: 0.0
idle: 8788225.51
iowait: 2188.63
irq: 0.0
nice: 19.19
softirq: 14.13
steal: 0.0
system: 765.38
user: 5233.25
processes: []
timestamp: 1581979801.9625692
- cpu_percent: 0.0
cpu_times:
guest: 0.0
guest_nice: 0.0
idle: 8788265.53
iowait: 2188.63
irq: 0.0
nice: 19.19
softirq: 14.13
steal: 0.0
system: 765.38
user: 5233.25
processes: []
timestamp: 1581979802.963791
```

## Expose the proc stat sampler output for prometheus scraping

```bash
# run in a separate terminal
./prom.py --track nginx envoy --http-port 8000
```

# Querying the statistics

Note: output is from an early version, which only tracked internal metrics from the prometheus client lib).

```bash
curl --silent 127.00.1:8000 | head

oschaaf@burst:~/code/istio/tools/procstat$ curl --silent 127.0.0.1:8000 | head
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 123.0
python_gc_objects_collected_total{generation="1"} 255.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0

```

## Exposing prometheus metrics in side car proxy containers.

The following script will build a standalone binary, deploy it to the benchmark
side car proxy containers, and fire up the service.

```bash
NAMESPACE=twopods-istio ./install_to_container.sh
```

## Testing if the service is running in containers

The service will listen on port 8000 by default. Hence querying that port with curl ought to output a bunch of counters in prometheus format.

``` bash
kubectl --namespace twopods-istio exec fortioclient-6b58bf5799-hkq8l -c istio-proxy curl 127.0.0.1:8000

...
cpu_times_system 6217.48
...
```

21 changes: 21 additions & 0 deletions procstat/dump-to-yaml.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env python3

import argparse
from modules.collector import Collector
from modules.yaml_formatter import to_yaml
import sys
import time
import os

def main():
parser = argparse.ArgumentParser(description='Transforms dumps from the sampler to yaml')
parser.add_argument("--dump-path", type=str, help='Path where the target dump resides.')
args = parser.parse_args()
with open(args.dump_path, "rb") as file:
collector = Collector(file=None)
yaml = to_yaml(list(collector.read_dump(file)))
print(yaml)


if __name__ == "__main__":
main()
42 changes: 42 additions & 0 deletions procstat/install_to_container.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash

set +x
set -e

echo "Building standalone binary"

export NAMESPACE=${NAMESPACE:-'twopods-istio'}

if [ ! -d dist/ ]
then
rm requirements.txt || true
python3 -m venv procstatenv
source procstatenv/bin/activate
pip3 install prometheus-client psutil
# We strip a line because of a bug in pip freeze
pip freeze | grep -v "pkg-resources" > requirements.txt
# We build on a docker to make sure we produce a compatible binary
# (we need to make sure to build it with a compatible glibc version)
# TODO(oschaaf): is it OK to use this docker image?
docker run -v "$(pwd):/src/" cdrx/pyinstaller-linux:python3 "pyinstaller prom.py"
fi

echo "Deploying standalone binary"

kubectl get pods --namespace twopods-istio --no-headers --field-selector=status.phase=Running -o name | while read pod
do
# Strip the pod/ prefix we get for free
pod=${pod#"pod/"}
echo "Installing to ${pod}"
kubectl --namespace ${NAMESPACE} exec ${pod} -c istio-proxy -- rm -rf /etc/istio/proxy/procstat
kubectl --namespace ${NAMESPACE} cp ./ ${pod}:/etc/istio/proxy/procstat -c istio-proxy
echo "Fire service in ${pod}"
# Stop the existing service instance, if any
kubectl --namespace ${NAMESPACE} exec ${pod} -c istio-proxy -- pkill -f prom || true
# Fix, this neesd the kubectl command to stay running on the machine running this script
kubectl --namespace ${NAMESPACE} exec ${pod} -c istio-proxy /etc/istio/proxy/procstat/dist/prom/prom &
done

echo "proc stat sampling deployed"


43 changes: 43 additions & 0 deletions procstat/modules/collector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
from pickle import Pickler, Unpickler
from threading import Thread
from time import sleep
from modules.sampler import Sampler
import os
import tempfile


class Collector:
def __init__(self, file, sampler=Sampler(), sample_interval=1.0):
self.file = file
self.sample_interval = sample_interval
self.thread = Thread(target=self.work)
self.sampler = sampler

def work(self):
while self.running:
self.pickler.dump(self.sampler.get_snapshot())
# We clear this so the pickler won't remember which objects
# it has already seen. This allows us to restore flattened
# process structured, thereby serializing a flattened version
# into yaml.
self.pickler.clear_memo()
sleep(self.sample_interval)

def start(self):
self.running = True
self.pickler = Pickler(self.file)
self.thread.start()

def stop(self):
self.running = False
self.thread.join()
self.file.close()

def read_dump(self, file):
unpickler = Unpickler(file)
done = False
while not done:
try:
yield unpickler.load()
except EOFError:
done = True
95 changes: 95 additions & 0 deletions procstat/modules/prometheus_http.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#!/usr/bin/env python3

from prometheus_client import start_http_server, Histogram, Summary, Gauge
import random
import time
from os import pipe, fdopen
from signal import signal, SIGINT, SIGTERM
from argparse import ArgumentParser
from modules.sampler import Sampler
from modules.collector import Collector

global COLLECTOR


def signal_handler(a, b):
'''
We will gracefully quit upon observing SIGTERM/SIGINT.
We do so by calling stop on the collector, which in turn will end up
closing the write side of the pipe that it is writing to.
This will be noticed by the code below, which processes the read side.
'''
print("stopping... ")
global COLLECTOR
COLLECTOR.stop()


class Prom:
def __init__(self):
pass

def run(self, arguments):
parser = ArgumentParser(description='Proc stat sampler CLI')
parser.add_argument("--track-proc-name", type=str, nargs="*",
help='Optional process name(s) to track.', default=[])
parser.add_argument("--sample-frequency", type=int, default=1,
help='Number of samples to obtain per second.')
parser.add_argument("--http-port", type=int, default=8000,
help='Http port for exposing prometheus metrics.')

args = parser.parse_args(arguments)

signal(SIGINT, signal_handler)
signal(SIGTERM, signal_handler)

global COLLECTOR
# We hand the write side of the pipe to our proc stat collector.
pipe_read_fd, pipe_write_fd = pipe()
COLLECTOR = Collector(fdopen(pipe_write_fd, "wb", 1024), sampler=Sampler(
process_names_of_interest=args.track_proc_name), sample_interval=1.0/args.sample_frequency)
# Start serving prometheus stats over http
start_http_server(args.http_port)

# Start sampling proc stat.
COLLECTOR.start()

cpu_times_guest = Gauge('cpu_times_guest', '')
cpu_times_guest_nice = Gauge('cpu_times_guest_nice', '')
cpu_times_idle = Gauge('cpu_times_idle', '')
cpu_times_iowait = Gauge('cpu_times_iowait', '')
cpu_times_irq = Gauge('cpu_times_irq', '')
cpu_times_nice = Gauge('cpu_times_nice', '')
cpu_times_softirq = Gauge('cpu_times_softirq', '')
cpu_times_steal = Gauge('cpu_times_steal', '')
cpu_times_system = Gauge('cpu_times_system', '')
cpu_times_user = Gauge('cpu_times_user', '')

cpu_stats_ctx_switches = Gauge('cpu_stats_ctx_switches', '')
cpu_stats_interrupts = Gauge('cpu_stats_interrupts', '')
cpu_stats_soft_interrupts = Gauge('cpu_stats_soft_interrupts', '')
cpu_stats_syscalls = Gauge('cpu_stats_syscalls', '')

# The collector will write proc stat samples to the file descriptor we handed it above.
# We will read those here, and update the prometheus stats according to these samples.
with fdopen(pipe_read_fd, "rb", 1024) as f:
it = COLLECTOR.read_dump(f)
# TODO(oschaaf): Add an option here to also stream the raw data to another fd,
# as we loose information in the summary we serve over http. This could be helpfull
# when in-depth analysis is desired of an observed problem.
for entry in it:
cpu_times_guest.set(entry["cpu_times"].guest)
cpu_times_guest_nice.set(entry["cpu_times"].guest_nice)
cpu_times_idle.set(entry["cpu_times"].idle)
cpu_times_iowait.set(entry["cpu_times"].iowait)
cpu_times_irq.set(entry["cpu_times"].irq)
cpu_times_nice.set(entry["cpu_times"].nice)
cpu_times_softirq.set(entry["cpu_times"].softirq)
cpu_times_steal.set(entry["cpu_times"].steal)
cpu_times_system.set(entry["cpu_times"].system)
cpu_times_user.set(entry["cpu_times"].user)

cpu_stats_ctx_switches.set(entry["cpu_stats"].ctx_switches)
cpu_stats_interrupts.set(entry["cpu_stats"].interrupts)
cpu_stats_soft_interrupts.set(entry["cpu_stats"].soft_interrupts)
cpu_stats_syscalls.set(entry["cpu_stats"].syscalls)
print("stopped")
Loading