Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prometheus exporter for nova-bigvm #388

Open
wants to merge 1 commit into
base: stable/xena-m3
Choose a base branch
from

Conversation

leust
Copy link

@leust leust commented Dec 20, 2022

Currently exposing the following metrics:

Counter nova_bigvm_host_errors{error, vc, host, rp}
Counter nova_bigvm_no_candidate_error{hv_size}
Gauge nova_bigvm_host_freeing_up{vc, host, rp}
Gauge nova_bigvm_free_hosts_count{}

Copy link

@joker-at-work joker-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part 1 of the review. I don't think I can check the logic stuff today.

'bigvm_exporter_listen_port',
default=9847,
help="""
Port where the BigVM prometheus exporter to listen for HTTP requests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "to listen" doesn't fit with "where". needs to be something like "exporter listens"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.



def start_bigvm_exporter():
port = int(CONF.bigvm_exporter_listen_port or 9847)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this int necessary? The option is defined as IntOpt, so I would assume we don't need it here.


def start_bigvm_exporter():
port = int(CONF.bigvm_exporter_listen_port or 9847)
start_http_server(port, registry=REGISTRY)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this start a new process or a new thread?

If it starts a thread, does it start an eventlet greenthread (because eventlet patched the threading module) or does it spawn a native thread?

If it spawns a greenthread, does the prometheus_client library do anything blocking that could hinder the manager to run properly?

If it spawns a native thread, we cannot use logging (or anything else take takes a threading.Lock) anywhere inside that native thread or we risk a hanging service.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything behind is based on daemonic threading.Thread, which to my understanding is patched by eventlet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed to answer one question: Does prometheus_client use anything that would block the process. eventlet greenthreads are not preempted, but give up the CPU when they would do blocking operations. This needs library support (or usage of one of the eventlet-patched functions). If the greenthread doesn't give up the CPU on blocking operations, no other greenthread will run.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code adds one more greenthread to the pgt() output. Here is it:

2 <greenlet.greenlet object at 0x7f802a44d510 (otid=0x7f8031506a00) suspended active started> 
  File "/var/lib/openstack/lib/python3.8/site-packages/eventlet/green/thread.py", line 42, in __thread_body
    func(*args, **kwargs)
  File "/usr/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/var/lib/openstack/lib/python3.8/site-packages/eventlet/green/thread.py", line 63, in wrap_bootstrap_inner
    bootstrap_inner()
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/socketserver.py", line 232, in serve_forever
    ready = selector.select(poll_interval)
  File "/usr/lib/python3.8/selectors.py", line 323, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
  File "/var/lib/openstack/lib/python3.8/site-packages/eventlet/green/select.py", line 80, in select
    return hub.switch()
  File "/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch
    return self.greenlet.switch()

Looking at the stack trace there is the socketserver.py which is the module that's being used by the prometheus library to expose the endpoint.

So this runs in a greenthread.

def __init__(self, registry=REGISTRY):
self.host_errors_counter = \
Counter('nova_bigvm_host_errors',
'Nova BigVM errors counter.',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be more descriptive as I could not tell you, what this means without looking into the code that increases the counter.


self.free_hosts_count_gauge = \
Gauge('nova_bigvm_free_hosts_count',
'Nova BigVM hosts available.',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on the description a little more, please? hosts available for what? Is this the count of the resource-providers that are ready for a BigVM deployment?

Comment on lines 73 to 74
self.host_freeing_up_gauge.remove(
rp['vc'], rp['host'], rp['rp']['name'])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if that provider is not in here, because we just restarted and don't have any data about this provider anymore?

Copy link

@grandchild grandchild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/prometheus/client_python#disabling-default-collector-metrics

It sounds like we have to disable these automatic metrics? Or do we want them? I don't think we need GC and process stats.

nova/bigvm/exporter.py Outdated Show resolved Hide resolved
nova/bigvm/exporter.py Outdated Show resolved Hide resolved
nova/bigvm/exporter.py Outdated Show resolved Hide resolved
@leust
Copy link
Author

leust commented Apr 26, 2023

It sounds like we have to disable these automatic metrics?

I haven't seen any other metric apart from the ones defined by us, while testing the exporter endpoint.

Copy link

@joker-at-work joker-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but still 2 questions remaining.


def start_bigvm_exporter():
port = int(CONF.bigvm_exporter_listen_port or 9847)
start_http_server(port, registry=REGISTRY)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed to answer one question: Does prometheus_client use anything that would block the process. eventlet greenthreads are not preempted, but give up the CPU when they would do blocking operations. This needs library support (or usage of one of the eventlet-patched functions). If the greenthread doesn't give up the CPU on blocking operations, no other greenthread will run.

else:
free_hosts += 1

bigvm_metrics.set_free_hosts_count(free_hosts)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to show the free hosts per hv_size instead of the overall number, because it depends on the size if we can spawn certain instances?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added hv_size label.

@leust leust force-pushed the bigvm_exporter branch 2 times, most recently from f3ffa01 to 60b8dd1 Compare December 11, 2023 14:52
Exposing the following metrics:

Counter nova_bigvm_host_errors{error, vc, host, rp}
Counter nova_bigvm_no_candidate_error{hv_size}
Gauge nova_bigvm_host_freeing_up{vc, host, rp}
Gauge nova_bigvm_free_hosts_count{hv_size}

Change-Id: I050eeb1036910c03428eaa8aad7e992f241f6f51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants