kubectl-debug
is an out-of-tree solution for troubleshooting running pods, which allows you to run a new container in running pods for debugging purpose. The new container will join the pid
, network
, user
and ipc
namespaces of the target container, so you can use arbitrary trouble-shooting tools without pre-install them in your production container image.
Compatibility: I've tested
kubectl-debug
with kubectl v1.13.1 and kubernetes v1.9.1. I don't have an environment to test more versions but I suppose thatkubectl-debug
is compatible with all versions of kubernetes and kubectl 1.12.0+. Please file an issue if you findkubectl-debug
do not work.
Install the debug agent DaemonSet in your cluster, which is responsible to run the "new container":
kubectl apply -f https://raw.githubusercontent.com/aylei/kubectl-debug/master/scripts/agent_daemonset.yml
Install the kubectl debug plugin:
curl
Try it out!
kubectl debug POD_NAME
# learn more with
kubectl debug -h
Clone this repo and:
# build plugin
go build -o kubectl-debug ./cmd/plugin
# install plugin
mv kubectl-debug /usr/local/bin
# build agent
go build -o debug-agent ./cmd/agent
# build agent image
docker build . -t debug-agent
kubectl-debug
use nicolaka/netshoot as the default image to run debug container, and use bash
as default entrypoint.
You can override the default image and entrypoint with cli flag, or even better, with config file ~/.kube/debug-config
:
agent_port: 10027
image: nicolaka/netshoot:latest
command:
- '/bin/bash'
- '-l'
PS: kubectl-debug
will always override the entrypoint of the container, which is by design to avoid users running an unwanted service by mistake(of course you can always do this explicitly).
kubectl-debug
consists of 2 components:
- the kubectl plugin: a cli client of
node agent
, serveskubectl debug
command, - the node agent: responsible for manipulating the "debug container"; node agent will also act as a websockets relay for remote tty
When user run kubectl debug target-pod -c <container-name> /bin/bash
:
- The plugin get the pod info from apiserver and extract the
hostIP
, if the target container is no existed or not currently running, an error raised. - The plugin send a HTTP request to the specific node agent running on the
hostIP
, which includes a protocol upgrade from HTTP to SPDY. - The agent runs a container in the pod's namespaces (ipc, pid, network, etc) with the STDIN stay open (
-i
flag). - The agent checks if the target container is actively running, if not, write an error to client.
- The agent runs a
debug container
withtty
andstdin
opened, thedebug contaienr
will join thepid
,network
,ipc
anduser
namespace of the target container. - The agent pipes the connection io to the
debug contaienr
usingattach
- Debug in the debug container.
- Jobs done, user close the SPDY connection.
- The node agent close the SPDY connection, then wait the
debug contaienr
exit and do the cleanup.
Feel free to open issues and pull requests. Any feedback will be highly appreciated!