You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are plenty of issues I am facing, it's practically unusable in its current state. I have done plenty of debugging, please read my findings below.
The examples in the documentation don't work.
Only the :vm_qemu_ping command works. The :guest_info mentioned later fails. This happens because...
The guest-agent returns JSON replies and OpenNebula throws syntax errors.
Meaningless errors like the following are visible in monitor.log:
, error: syntax error, unexpected COMMA, expecting $end at line 1095005315, columns 195:197
This happens because OpenNebula expects a single "return" JSON key that contains content of VM_TEMPLATE syntax.
However, what the guest-agent returns is a JSON object with a "return" key containing a deep JSON hierarchy, depending on the commands.
Even if we pass the guest agent response through jq, it will still be of JSON syntax so it will not work and meaningless errors will be logged.
The workaround is to change the command to return a single string under the return JSON element. See the :vm_ip_address command on this pull request that I opened: #6762
This returns { "return": "10.218.100.2" } which is accepted by opennebula because a single string is both valid JSON and valid VM_TEMPLATE syntax.
This processing is cumbersome and can't provide any kind of complex monitoring, like getting the list of all IP addresses (demonstrated by the 2nd command on the same pull request, which does not work because of JSON).
On a hypervisor host with many VMs, if a single VM is frozen, then monitoring fails for all VMs of the host
Then again useless errors are being logged:
Received STATE_VM message from host 1:
Error executing state.rb: unexpected token at ''
Tue Oct 29 15:30:12 2024 [Z0][MDP][W]: Failed to monitor VM state for host 1: Error executing state.rb: unexpected token at ''
Like the previous errors, these too are misleading. What actually happened is that one single VM on the host times out like this:
# virsh qemu-agent-command one-4 --cmd '{"execute":"guest-ping"}' --timeout 5
error: Guest agent is not responding: Guest agent not available for now
This can happen either because a VM is frozen, or because it is still booting.
When I recover this single VM then monitoring suddenly works for all VMs on the host.
The text was updated successfully, but these errors were encountered:
I have been trying to enable monitoring in OpenNebula 6.10.0 based on these instructions:
https://docs.opennebula.io/6.10/open_cluster_deployment/kvm_node/kvm_driver.html#enabling-qemu-guest-agent
There are plenty of issues I am facing, it's practically unusable in its current state. I have done plenty of debugging, please read my findings below.
The examples in the documentation don't work.
Only the
:vm_qemu_ping
command works. The:guest_info
mentioned later fails. This happens because...The guest-agent returns JSON replies and OpenNebula throws syntax errors.
Meaningless errors like the following are visible in
monitor.log
:This happens because OpenNebula expects a single
"return"
JSON key that contains content of VM_TEMPLATE syntax.However, what the guest-agent returns is a JSON object with a
"return"
key containing a deep JSON hierarchy, depending on the commands.Even if we pass the guest agent response through
jq
, it will still be of JSON syntax so it will not work and meaningless errors will be logged.The workaround is to change the command to return a single string under the
return
JSON element. See the:vm_ip_address
command on this pull request that I opened: #6762This returns
{ "return": "10.218.100.2" }
which is accepted by opennebula because a single string is both valid JSON and valid VM_TEMPLATE syntax.This processing is cumbersome and can't provide any kind of complex monitoring, like getting the list of all IP addresses (demonstrated by the 2nd command on the same pull request, which does not work because of JSON).
On a hypervisor host with many VMs, if a single VM is frozen, then monitoring fails for all VMs of the host
Then again useless errors are being logged:
Like the previous errors, these too are misleading. What actually happened is that one single VM on the host times out like this:
This can happen either because a VM is frozen, or because it is still booting.
When I recover this single VM then monitoring suddenly works for all VMs on the host.
The text was updated successfully, but these errors were encountered: