-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hang when containers set to start automatically #23
Comments
I am seeing this exact same behavior on both a Centos 7 install and Ubuntu 21.04 install both on amd64 hardward. |
I am also facing this issue on Ubuntu 20.04 armv8 and Ubuntu 18.04 armv7. |
Me also, I'm new to docker and it took me a really long time and 2 reinstalls to realise this was the problem. Edit:
to my rc.local |
I'm also having this problem. From a quick glance, it looks like the plugin tries to access the Docker socket before it's up and running, and there's no timeout on the call and no way that I can see to defer the starting of the containers until after Docker is more started. 🤷 Logs below give a rough indication of the order of things - the container loads and hangs connecting to Docker - is interrupted (By me), and then starts providing the routes it relies on 🤦
|
Same problem. I worked around it by starting the container independently, from systemd. Not very stable though. |
I love that this plugin let my containers get an IP over DHCP. Any idea on how to solve it? |
Same problem. I tried to research a very tricky and ad hoc way to temporarily circumvent this problem.
This file is really tricky. The essence is to execute docker start ${environment variable} to start the container after docker starts normally. Here my environment variable file is placed under
The trouble is, because docker is no longer used to manage boot-started containers, whenever there is a container that needs to be booted up, you need to append the container name or container ID to the environment variable file, so that systemctl can control it to start automatically after booting My system environment is |
Nov 08 00:55:14 docker001 docker[9228]: Error response from daemon: No such container: mosquitto portainer esphome frigate rtlamr2mqtt lms watchtower homeassistant Manual test: Seems to work fine. Edit: Got this working like this:
/usr/bin/dockernetdhcpstart.sh
/usr/bin/dockernetdhcpstop.sh
|
@devplayer0 are you able to comment on the overall OP issue above? Thank you. |
Getting this issue as well. Time to try some of these solutions. EDIT: |
Ended up just tweaking your script a tad for docker-compose.
|
@devplayer0 any chance you can compile the above timeout fix for one last build? I've been using this plugin for years and this is the only issue I have had. Would be nice to not have to work around it with startup scripts. |
docker-net-dhcp/pkg/plugin/plugin.go Line 78 in 03694af
docker-net-dhcp/pkg/plugin/plugin.go Line 82 in 1bb0ffe
|
To manually apply PR #43 and compile/install the plugin locally (fixes this and issue #42 1.13.1 error with newer versions of docker) hope this helps people like me who don't know what they are doing! git clone https://github.com/devplayer0/docker-net-dhcp.git
cd docker-net-dhcp
git fetch origin pull/43/head:celerway
git checkout celerway
Switched to branch 'celerway'
git branch -a
* celerway
master
remotes/origin/HEAD -> origin/master
remotes/origin/dependabot/go_modules/github.com/containerd/containerd-1.5.18
remotes/origin/dependabot/go_modules/github.com/docker/docker-20.10.24incompatible
remotes/origin/dependabot/go_modules/golang.org/x/net-0.7.0
remotes/origin/dependabot/go_modules/golang.org/x/sys-0.1.0
remotes/origin/master
make create
docker plugin ls
ID NAME DESCRIPTION ENABLED
############ ghcr.io/devplayer0/docker-net-dhcp:golang Docker host bridge DHCP networking false
sudo docker plugin enable ghcr.io/devplayer0/docker-net-dhcp:golang
docker plugin ls
ID NAME DESCRIPTION ENABLED
############ ghcr.io/devplayer0/docker-net-dhcp:golang Docker host bridge DHCP networking true
sudo docker network ls
NETWORK ID NAME DRIVER SCOPE
############ bridge bridge local
############ config_default bridge local
############ dbrv100 ghcr.io/devplayer0/docker-net-dhcp:release-linux-amd64 local
############ dbrv200 ghcr.io/devplayer0/docker-net-dhcp:release-linux-amd64 local
############ dbrv300 ghcr.io/devplayer0/docker-net-dhcp:release-linux-amd64 local
############ dbrv350 ghcr.io/devplayer0/docker-net-dhcp:release-linux-amd64 local
############ dbrv400 ghcr.io/devplayer0/docker-net-dhcp:release-linux-amd64 local
Shows all your old docker-net-dhcp additions, we need to remove them :(
I first tried:
sudo docker network rm dbrv100
Error response from daemon: error while removing network: failed deleting Network: plugin "ghcr.io/devplayer0/docker-net-dhcp:release-linux-amd64" not found
So then I nuked them with
sudo docker network prune
WARNING! This will remove all custom networks not used by at least one container.
Are you sure you want to continue? [y/N] y
sudo docker network ls
NETWORK ID NAME DRIVER SCOPE
############ bridge bridge local
############ host host local
############ none null local
Then added them all back with the new compiled driver
sudo docker network create -d ghcr.io/devplayer0/docker-net-dhcp:golang --ipam-driver null -o bridge=brv100 dbrv100
sudo docker network create -d ghcr.io/devplayer0/docker-net-dhcp:golang --ipam-driver null -o bridge=brv200 dbrv200
sudo docker network create -d ghcr.io/devplayer0/docker-net-dhcp:golang --ipam-driver null -o bridge=brv300 dbrv300
sudo docker network create -d ghcr.io/devplayer0/docker-net-dhcp:golang --ipam-driver null -o bridge=brv350 dbrv350
sudo docker network create -d ghcr.io/devplayer0/docker-net-dhcp:golang --ipam-driver null -o bridge=brv400 dbrv400
sudo docker network ls
NETWORK ID NAME DRIVER SCOPE
############ bridge bridge local
############ dbrv100 ghcr.io/devplayer0/docker-net-dhcp:golang local
############ dbrv200 ghcr.io/devplayer0/docker-net-dhcp:golang local
############ dbrv300 ghcr.io/devplayer0/docker-net-dhcp:golang local
############ dbrv350 ghcr.io/devplayer0/docker-net-dhcp:golang local
############ dbrv400 ghcr.io/devplayer0/docker-net-dhcp:golang local
############ host host local
############ none null local
Now they are all back in action but all the containers still point to the old bridges 😭
docker container inspect genericcontainer001
you will see all the old ID's for the old networks...
I tried doing
sudo docker network disconnect dbrv100 genericcontainer001
sudo docker network connect dbrv100 genericcontainer001
This changed the network so I thought win!
Yet NetworkMode was still stuck with the old network and would fail on startup.
sudo docker container start genericcontainer001
Error response from daemon: could not find a network matching network mode ####...: network ####... not found
Error: failed to start containers: genericcontainer001
So then I ran compose
docker-compose.yml
version: '3.9'
services:
genericcontainer001:
container_name: genericcontainer001
hostname: genericcontainer001
mac_address: de:ad:be:ef:00:01
networks:
- dhcp
networks:
dhcp:
#mac_address: de:ad:be:ef:00:01 #(for docker engine verison 25)
name: dbrv100
external: true
docker compose up -d
It came back up with the right network without an issue yay! Thanks to encbladexp in docker discord for helping me figure out the compile technique! Now I just need to overcome the docker compose / dockerd ignoring the set mac_address then I will be up and running again. I had it working prior by using moby build of dockerd but something else must have broke in it as well as it just generates random mac addresses which breaks my dnsmasq static IP's set via dhcp to mac addresses :( Portainer is working on a fix in 2.20 the rest of docker seems to have no fix for it again. |
I made a container image that includes the fix for PR #43 . Everyone can use this image to install the plugin directly.
Based on the above process, a similar interlocking scenario is generated, which causes the container service installed with this plugin to fail to start normally. |
I wrote a bash script and a custom systemctl service that can start the container service with the DHCP plug-in enabled at boot time . The container service list is obtained dynamically through the configuration file, and there is no need to manually configure the container list. Before using this script, you need to ensure that the jq command is installed on the server. It is best to use it with the plugin cat /lib/systemd/system/docker-dhcp-container.service
cat /data/scripts/dhcpcontainer.sh
|
Work with the current free claude.ai 3.5 sonnet to code what you need. Its humaneval is currently at 92.7% it may bring you to figure out how it needs to go together where you said you were lacking the skill. I don't like using bash scripting to solve where it has an issue. |
Thanks for this project - it is great. I have noticed a small problem however:
When a container uses this plugin, and its restart policy makes the container start automatically when Docker starts, this plugin appears to hang. This prevents the container from starting, and seems to block the Docker daemon from responding too. If I
kill
this plugin's process, Docker seems to recover (but the container obviously doesn't come up properly).If the container is not set to start automatically, and I instead start it manually, everything works fine.
I have narrowed the problem down to this line - it seems the call to
NetworkInspect
never returns, even after several hours.I thought the problem might be a race condition, where the network was not fully up before the plugin tries to inspect it. However, inserting a delay before the call does not appear to help.
The logs do not provide any clues.
Because the Docker daemon stops responding, I am unfortunately not able to get a stack trace from it.
Please could you let me know how I might diagnose the problem further? I'm using up to date versions of Docker, Ubuntu and the kernel. The only complicating factor is that it's on an armv7l SBC 🙈
Many thanks
The text was updated successfully, but these errors were encountered: