-
Notifications
You must be signed in to change notification settings - Fork 327
Nomad: waypoint-static-runner - error connecting to server context deadline exceeded #4550
Comments
Hey there @chrisvanmeer - I think the link you made to Can you confirm that when your runner allocation gets started, it's able to resolve your Nomad server allocation via Consuls DNS? It could be that the runner can't resolve the hostname, which is why it fails to connect and times out. |
Hey @briancain sorry I updated the link. I intended to refer to the release notes of the fix: #4363. Do you have any tips on troubleshooting that? The runner job doesn't output any logs that seems that it cannot resolve it and the job keeps restarting quickly so I cannot exec into it to perform decent troubleshooting. |
@chrisvanmeer - I believe this is a nomad setting, but I would recommend configuring Nomad to not clean up allocations on failures immediately. That should leave them around long enough for you to dive in and try to troubleshoot what's going on. Hopefully that helps! If you can resolve the nomad server addr and get a response outside of the runner (i.e. with telnet or curl), then it's likely another issue with our runner install. And I see! Thanks for updating the link. Yes - I don't think that PR you linked should be causing the issue, from what I can tell in the logs, it should be connecting to the right port for gRPC: |
With a lot of patience and copy paste commands ready I managed to verify that the $ nomad alloc exec -task runner c90bf8d8 sh
/ $ ping waypoint-server.service.nl.vanmeer.eu
ping: bad address 'waypoint-server.service.nl.vanmeer.eu'
/ $ ping waypoint-server.service.vanmeer.eu
ping: bad address 'waypoint-server.service.vanmeer.eu'
/ $ % Even when I manually specify a DNS server in the job spec of the runner and restart it, it will not resolve the server address. Which is funny, since other jobs on the same nomad cluster can resolve each other fine. |
Hi! I have the same error, I'm testing with a single node and since I tried to set an
And when I do that the I also was trying to set the waypoint service's This is the
And this one the
I'm running this version
Edit I found that when I set the Then I returned to the first error the I saw that the
It was in the range of
And now it works.... 🎉 |
Describe the bug
My first attempt at running Waypoint on Nomad failed with the waypoint-server having the same error stated above. I was pointed to the
0.11.0
which has a fix in there for this behaviour. This problem now persists in the waypoint-static-runner.Steps to Reproduce
I since then upgraded to the suggested version and tried the install again on a new Nomad cluster (ACL bootstrapped, mTLS and gossip encryption, consul integration) and followed the same tutorial. Now the waypoint-server job indeed finishes healthy, but now the waypoint-static-runner job returns the same error.
stdout
from the waypoint install commandstderr
logs of the waypoint-static-runner allocationBut the waypoint server is accessible to both desktop and servers
Expected behavior
waypoint-static-runner job healthy on Nomad and installer runs through without issues.
Waypoint Platform Versions
Additional version and platform information to help triage the issue if
applicable:
0.11.0
nomad
N/A
The text was updated successfully, but these errors were encountered: