qemu-guest-agent monitoring is unusable #6765

jimis · 2024-10-29T17:07:26Z

I have been trying to enable monitoring in OpenNebula 6.10.0 based on these instructions:
https://docs.opennebula.io/6.10/open_cluster_deployment/kvm_node/kvm_driver.html#enabling-qemu-guest-agent

There are plenty of issues I am facing, it's practically unusable in its current state. I have done plenty of debugging, please read my findings below.

The examples in the documentation don't work.

Only the :vm_qemu_ping command works. The :guest_info mentioned later fails. This happens because...

The guest-agent returns JSON replies and OpenNebula throws syntax errors.

Meaningless errors like the following are visible in monitor.log:

 , error: syntax error, unexpected COMMA, expecting $end at line 1095005315, columns 195:197

This happens because OpenNebula expects a single "return" JSON key that contains content of VM_TEMPLATE syntax.

However, what the guest-agent returns is a JSON object with a "return" key containing a deep JSON hierarchy, depending on the commands.

Even if we pass the guest agent response through jq, it will still be of JSON syntax so it will not work and meaningless errors will be logged.

The workaround is to change the command to return a single string under the return JSON element. See the :vm_ip_address command on this pull request that I opened: #6762

    :vm_ip_address:     >
                        one-$vm_id '{"execute":"guest-network-get-interfaces"}' --timeout 5 |
                        jq '{"return" : [ .return[]."ip-addresses"[]|select(."ip-address-type"=="ipv4" and (."ip-address"|startswith("127.")|not))."ip-address" ][0]}'

This returns { "return": "10.218.100.2" } which is accepted by opennebula because a single string is both valid JSON and valid VM_TEMPLATE syntax.

This processing is cumbersome and can't provide any kind of complex monitoring, like getting the list of all IP addresses (demonstrated by the 2nd command on the same pull request, which does not work because of JSON).

On a hypervisor host with many VMs, if a single VM is frozen, then monitoring fails for all VMs of the host

Then again useless errors are being logged:

Received STATE_VM message from host 1:
Error executing state.rb: unexpected token at ''
Tue Oct 29 15:30:12 2024 [Z0][MDP][W]: Failed to monitor VM state for host 1: Error executing state.rb: unexpected token at ''

Like the previous errors, these too are misleading. What actually happened is that one single VM on the host times out like this:

# virsh  qemu-agent-command one-4 --cmd '{"execute":"guest-ping"}' --timeout 5
error: Guest agent is not responding: Guest agent not available for now

This can happen either because a VM is frozen, or because it is still booting.
When I recover this single VM then monitoring suddenly works for all VMs on the host.

The text was updated successfully, but these errors were encountered:

jimis added the Type: Bug label Oct 29, 2024

jimis mentioned this issue Oct 29, 2024

KVM: Monitor VM IP address using qemu-guest-agent #6762

Open

tinova assigned dann1 Nov 18, 2024

tinova added this to the Release 6.10.2 milestone Nov 18, 2024

tinova added Category: Context Status: Accepted Priority: Normal Category: Drivers - Monitor and removed Category: Context labels Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qemu-guest-agent monitoring is unusable #6765

qemu-guest-agent monitoring is unusable #6765

jimis commented Oct 29, 2024 •

edited

Loading

qemu-guest-agent monitoring is unusable #6765

qemu-guest-agent monitoring is unusable #6765

Comments

jimis commented Oct 29, 2024 • edited Loading

The examples in the documentation don't work.

The guest-agent returns JSON replies and OpenNebula throws syntax errors.

On a hypervisor host with many VMs, if a single VM is frozen, then monitoring fails for all VMs of the host

jimis commented Oct 29, 2024 •

edited

Loading