-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add checks for openvswitch and helper function to enable the checks #601
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch; just a couple of comments around testing and extracting data from the ovs-vsctl command.
ovs_error_re = re.compile(r"^.*error: (?P<message>.+)$", re.I) | ||
for line in ovs_output.decode(errors="ignore").splitlines(): | ||
m = ovs_error_re.match(line) | ||
if m: | ||
ovs_vsctl_show_errors.append(m.group("message")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using a regex, the ovs-vsctl
command does support outputing in json (--format=json
). Is there a reason for not doing that (perhaps that the errors would appear in different nodes in different versions??) Just wondering how to make it less magic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ovs-vsctl show does not support the --format option (the args are parsed, but the output is not differentiated).
root@ruling-manta:/home/ubuntu# ovs-vsctl -f json show
fcaf57e2-8667-4972-b687-169789d1d15d
Bridge "br0"
Port "dpdk-p1"
Interface "dpdk-p1"
type: dpdk
options: {dpdk-devargs="0000:01:00.1"}
error: "could not open network device dpdk-p1 (Address family not supported by protocol)"
Port "br0"
Interface "br0"
type: internal
Port "dpdk-p0"
Interface "dpdk-p0"
type: dpdk
options: {dpdk-devargs="0000:01:00.0"}
error: "could not open network device dpdk-p0 (Address family not supported by protocol)"
ovs_version: "2.9.8"
I've also investigated using the json output for ovs-vsctl list Interfaces
:
{"data":[[["uuid","9bc2d04d-0069-4a76-918a-b9ac61d4c5ce"],["set",[]],["map",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],"could not open network device dpdk-p1 (Address family not supported by protocol)",["map",[]],["set",[]],0,0,["set",[]],["set",[]],["set",[]],["set",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],"dpdk-p1",-1,["set",[]],["map",[["dpdk-devargs","0000:01:00.1"]]],["map",[]],["map",[]],["map",[]],"dpdk"],[["uuid","2d8758ca-510f-4399-ab84-8c1f3a33061e"],"down",["map",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["map",[]],4,0,0,["set",[]],0,["set",[]],"down",["map",[]],["set",[]],"c2:bd:86:7f:fa:4d",1500,["set",[]],"br0",65534,["set",[]],["map",[]],["map",[]],["map",[["collisions",0],["rx_bytes",0],["rx_crc_err",0],["rx_dropped",2],["rx_errors",0],["rx_frame_err",0],["rx_over_err",0],["rx_packets",0],["tx_bytes",0],["tx_dropped",0],["tx_errors",0],["tx_packets",0]]],["map",[["driver_name","openvswitch"]]],"internal"],[["uuid","d67191aa-bdae-4ed5-95c8-86778276052e"],["set",[]],["map",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],"could not open network device dpdk-p0 (Address family not supported by protocol)",["map",[]],["set",[]],0,0,["set",[]],["set",[]],["set",[]],["set",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],"dpdk-p0",-1,["set",[]],["map",[["dpdk-devargs","0000:01:00.0"]]],["map",[]],["map",[]],["map",[]],"dpdk"]],"headings":["_uuid","admin_state","bfd","bfd_status","cfm_fault","cfm_fault_status","cfm_flap_count","cfm_health","cfm_mpid","cfm_remote_mpids","cfm_remote_opstate","duplex","error","external_ids","ifindex","ingress_policing_burst","ingress_policing_rate","lacp_current","link_resets","link_speed","link_state","lldp","mac","mac_in_use","mtu","mtu_request","name","ofport","ofport_request","options","other_config","statistics","status","type"]}
Very odd that the project uses "headings" for indexing the data model instead of making it key-value oriented output as would be expected of json. Trying to use this instead of parsing with regex, I get the following that requires some additional processing for values that are empty, as every Interface has an errors key, and if it's blank, the value of that list index is a list that contains the data type, "set", and the empty set, [].
>>> error_index = data["headings"].index("error")
>>> for interface in data["data"]:
... print(interface[error_index])
...
could not open network device dpdk-p1 (Address family not supported by protocol)
['set', []]
could not open network device dpdk-p0 (Address family not supported by protocol)
This could certainly be used instead of regex with an if interface[error_index] != list(['set', []]):
but I chose the simpler to read regex, and am also hoping to catch errors from 'ovs-vsctl show' that may not be Interface related (though for the current requirement, limiting to checking for Interface errors would suffice).
Do you have advice regarding readability of code vs using something other than regex in a situation like this? The regex, to me, seemed more elegant and readable vs the additional handling of missing "error" index as well as the handling of the ['set', []].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After investigating all of the other tables, Interfaces is the only table that has the error column, so I'll write this more deterministically and be able to include potentially vital interface information in the notification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize it wasn't a complete implemention (the --format option being missing). I always worry about using regex's on human-readable/consumable output as it is prone to be changed (on a whim sometimes!) and so it can make the code brittle.
I think from your explanations, it's fine to go with regex as a pragmatic solution as long as all error conditions are handled. I'll go back and look again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated with latest commit to use the Interface table json.
e688bdc
to
76db2be
Compare
76db2be
to
b60df86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of trivial feedback; feel free to ignore it all if you like. Just happy to see this check be added.
nrpe.add_check( | ||
shortname='openvswitch', | ||
description='Check Open vSwitch {%s}' % unit_name, | ||
check_cmd='check_openvswitch.py') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about make the name more explicit, e.g. check_ovs_interfaces.py or check_ovs_ifaces.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was definitely feeling this would be the start of something that could be expanded to more checks as we identify them. The interface errors are just MVP for the current need.
def enable_sudo_for_openvswitch_checks(): | ||
sudoers_dir = "/etc/sudoers.d" | ||
sudoers_mode = 0o100440 | ||
ovs_sudoers_file = "99-check_openvswitch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, suggest check_ovs_interfaces; unless the plan is to expand in the future?
def parse_args(argv=None): | ||
"""Process CLI arguments.""" | ||
parser = argparse.ArgumentParser( | ||
prog="check_openvswitch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previous
76d7842
to
20d0ddb
Compare
20d0ddb
to
749a024
Compare
This commit provides an nrpe script to check for errors in
ovs-vsctl show
output to be shared across several openstack networking charms or any other charm that may wish to add openvswitch monitoring.There is also a helper function in contrib.charmsupport.nrpe to add_openvswitch_checks which will setup the necessary sudoers rights for the nagios user to introspect the running openvswitch process.
The check_openvswitch.py script relies upon nagios_plugins3 module which is delivered with charm-nrpe. The add_openvswitch_checks method should not be used outside of the context of an nrpe-relation hook.