Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour: failed to dial libvirt #1004

Closed
itwars opened this issue Feb 23, 2023 · 17 comments
Closed

Strange behaviour: failed to dial libvirt #1004

itwars opened this issue Feb 23, 2023 · 17 comments

Comments

@itwars
Copy link
Contributor

itwars commented Feb 23, 2023

On the very host both commands works fine (localhost ip is 192.168.10.201):

virsh -c qemu:///system list --all
virsh -c qemu+ssh://[email protected]/system list --all

When terraforming using it's ok too:

provider "libvirt" {
  alias = "node1"
  uri   = "qemu:///system"
}

But when terraforming using, it's not good:

provider "libvirt" {
  alias = "node1"
  uri   = "qemu+ssh://[email protected]/system"
}
module.test-nouvelle-alpine.data.template_file.network_config: Reading...
module.test-nouvelle-alpine.data.template_file.user_data[0]: Reading...
module.test-nouvelle-alpine.data.template_file.network_config: Read complete after 0s [id=142f2ebd49dc333da2f0bf71a7bf27e6acf558cd743fd118b3695d41e5368ec8]
module.test-nouvelle-alpine.data.template_file.user_data[0]: Read complete after 0s [id=5b862b06752472a08f3daeb0d8f77bf3bd2755186a477dd2d5bef90a0d414b69]
╷
│ Error: failed to dial libvirt: failed to connect to libvirt on the remote host: ssh: rejected: connect failed (open failed)
│
│   with provider["registry.terraform.io/dmacvicar/libvirt"].node1,
│   on providers.tf line 1, in provider "libvirt":
│    1: provider "libvirt" {
│
  • By the way host server is an Alpine Linux distro 3.17.2
  • terraform-provider-libvirt v0.7.1
  • terraform v1.3.4
@itwars
Copy link
Contributor Author

itwars commented Feb 23, 2023

Digging a little deeper, I have this in my log file:

Feb 23 19:15:14 node1 auth.info sshd[4401]: Accepted publickey for user from 192.168.10.201 port 41598 ssh2: RSA SHA256:0f______________WdI
Feb 23 19:15:14 node1 auth.info sshd[4403]: Received request to connect to path /var/run/libvirt/libvirt-sock, but the request was denied.

acl on file file seems to be ok:

srwxrwxrwx 1 root root 0 Feb 23 18:29 /var/run/libvirt/libvirt-sock

@alexandre-janniaux
Copy link

What is your sshd configuration ?

@itwars
Copy link
Contributor Author

itwars commented Mar 6, 2023

Config file of sshd_config is quite basic :

PermitRootLogin yes
AuthorizedKeysFile      .ssh/authorized_keys
AllowTcpForwarding no
GatewayPorts no
X11Forwarding no
Subsystem       sftp    internal-sftp

No additional config in .ssh directory user 'user'

@LavBU
Copy link

LavBU commented Mar 7, 2023

Hi @itwars

You need to add the user to libvirt group as follows:
usermod -a -G libvirt <user>

for example, in main.tf:
uri = "qemu+ssh://<user>@<my host>/system?keyfile=/root/.ssh/<ssh private key>&sshauth=privkey&no_verify=1"

Then, on the physical host which you wish to deploy your VM in, you should add the user to libvirt group:
# usermod -a -G libvirt <user>

You can verify that, by checking it in /etc/group:
# grep libvirt /etc/group

I hope it will fix your issue.
Lavi

@itwars
Copy link
Contributor Author

itwars commented Mar 8, 2023

Hi @LavBU

is actually member of libvirt + qemu
in addition is sudoer

@LavBU
Copy link

LavBU commented Mar 8, 2023

Hi @itwars

If you are able to run this from where terraform is running on towards the remote host:
virsh -c qemu+ssh://[email protected]/system list --all

It means you are using SSH key to access the host.

Therefore, you should add that SSH key to authorized_keys file for your user that is defined in that host, for example:
# cat ~<user>/.ssh/authorized_keys

Should show you that SSH key.

Lavi

@itwars
Copy link
Contributor Author

itwars commented Mar 8, 2023

Hi @LavBU

virsh

On the localhost (192.168.10.201) both are ok:

virsh -c qemu:///system list --all
virsh -c qemu+ssh://[email protected]/system list --all

From a remote host (192.168.10.202 to 192.168.10.201) it's ok too:

virsh -c qemu+ssh://[email protected]/system list --all

terraform apply

On the localhost

Terraforming from 192.168.10.201 to 192.168.10.201 failed with:

module.test-nouvelle-alpine.data.template_file.network_config: Reading...
module.test-nouvelle-alpine.data.template_file.user_data[0]: Reading...
module.test-nouvelle-alpine.data.template_file.network_config: Read complete after 0s [id=142f2ebd49dc333da2f0bf71a7bf27e6acf558cd743fd118b3695d41e5368ec8]
module.test-nouvelle-alpine.data.template_file.user_data[0]: Read complete after 0s [id=5b862b06752472a08f3daeb0d8f77bf3bd2755186a477dd2d5bef90a0d414b69]
╷
│ Error: failed to dial libvirt: failed to connect to libvirt on the remote host: ssh: rejected: connect failed (open failed)
│
│   with provider["registry.terraform.io/dmacvicar/libvirt"].node1,
│   on providers.tf line 1, in provider "libvirt":
│    1: provider "libvirt" {
│
Terraform v1.3.4
on linux_amd64
+ provider registry.terraform.io/dmacvicar/libvirt v0.7.1
+ provider registry.terraform.io/hashicorp/template v2.2.0

Your version of Terraform is out of date! The latest version
is 1.3.9. You can update by downloading from https://www.terraform.io/downloads.html

From remote host

It's OK from 192.168.10.202 to 192.168.10.201 terraform deploy my VM!

I know, I know it's an old version

Terraform v1.1.2
on linux_amd64
+ provider registry.terraform.io/dmacvicar/libvirt v0.6.3
+ provider registry.terraform.io/hashicorp/template v2.2.0

Your version of Terraform is out of date! The latest version
is 1.3.9. You can update by downloading from https://www.terraform.io/downloads.html

So as a summary:

virsh:

  • A to A: OK
  • A to B: OK
  • B to A: OK

terraform:

  • A to A: ko
  • A to B: OK
  • B to A: OK

@LavBU
Copy link

LavBU commented Mar 8, 2023

H @itwars

try to check the system log when that fail:
# journalctl -f

Lavi

@itwars
Copy link
Contributor Author

itwars commented Mar 8, 2023

@LavBU

During the 'apply' I got this error:

Mar  8 15:34:24 nodeX1 auth.info sshd[5305]: Accepted publickey for user from 192.168.10.201 port 34474 ssh2: RSA SHA256:xxxxxxxxxxxxx
Mar  8 15:34:24 nodeX1 auth.info sshd[5307]: Received request to connect to path /var/run/libvirt/libvirt-sock, but the request was denied.

Checking my socket, permissions are good?

srwxrwxrwx 1 root root 0 Mar  5 12:33 /var/run/libvirt/libvirt-sock

@LavBU
Copy link

LavBU commented Mar 8, 2023 via email

@itwars
Copy link
Contributor Author

itwars commented Mar 9, 2023

Hello @LavBU, I've tested every anwsers without luke... Still stuck!

@MattSnow-amd
Copy link

I recently came across a similar experience. I needed an SSH key pair that did not have a passphrase. The public key needed to be in the remote user's ~/.ssh/authorized_keys file. Permissions for .ssh and contained files also need to be correct.

provider "libvirt" {
uri = "qemu+ssh://root@${var.hypervisor_host}/system?keyfile=/home/me/.ssh/id_ed25519-nopw"
}

@Magnitus-
Copy link

Magnitus- commented Mar 18, 2023

I can separately confirm that this feature works fine with this provider. We recently upgraded some of our orchestration operating libvirt on remote machines to version 0.7.1 of the provider without any issue (with terraform version 1.2.9).

Like MattSnow-amd mentioned, your issue is most likely an ssh setup issue for the user running terraform.

You need to ensure that whichever user runs terraform has proper passwordless ssh access to the libvirt user on the remote machine and you need to do this specifically for the environment in which terraform is running (ex: if you run terraform from a container, it may be that the container doesn't have the right ssh keys setup for example).

@itwars
Copy link
Contributor Author

itwars commented Mar 27, 2023

Hello,
Same issue even with :

provider "libvirt" {
uri = "qemu+ssh://root@${var.hypervisor_host}/system?keyfile=/home/me/.ssh/id_ed25519-nopw"
}

And I've also rollout every ssh key on all my cluster! OK with my 2 Ubuntu hosts, fail with my 4 Alpine hosts.

@Magnitus- : just to refresh the context the issue doesn't exist from remote, it's only happen from local server!

@itwars
Copy link
Contributor Author

itwars commented Apr 5, 2023

Hello,
It seem I'm not the only one facing this issue: #939 ?
I close this issue and move to the 939

@itwars itwars closed this as completed Apr 5, 2023
@Magnitus-
Copy link

@itwars Yeah, if you are confident about your ssh setup, it might be an obscure alpine incompatibilities. My understanding is that they use a lot of different lighter dependencies to make everything smaller which I know can cause some compatibility issues from my superficial usage of it in docker containers.

Unless specific constraints force my hand, I'm happy to stick with Ubuntu/Debian as it just makes my life a lot simpler operationally (there are just so many things to work on and so little time), so I won't be of much help here but it seems they are well underway to troubleshooting this in the thread you linked.

Best of luck.

@itwars
Copy link
Contributor Author

itwars commented Apr 17, 2023

Hooray! After digging deeper I've compare /etc/ssh/sshd_config line by line between ubuntu and alpine, and finally found the "guilty" line of configuration.
By changing:
AllowTcpForwarding no to AllowTcpForwarding yes!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants