-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemonset csi-rbdplugin pod pids.current value keep rising #4520
Comments
@yingxin-gh After the test, you have deleted all the applications right? did you get a chance to see if any process is using those pid values? |
Yes. I delete all the deployed pods and pvcs after test. There are no other workloads in the test node.
|
@Rakshith-R can you please take a look at it? |
Ceph-CSI supports golang profiling, see #1935 for details. Profiling can help pointing out what functions/routines are still running. Common causes include open connections to the Ceph cluster, go routines that were started on 1st use, but not exited once their work is done, ... |
Enable the profiling and re-test. The pidx.current is increased to 63 and stops the test. The goroutine info is as below.
|
Can you share all of the goroutines? The ones that you listed here are pprof and http service related, these are expected to keep running until a request comes in. Do you have goroutines that include source files with |
How to get all gorouting? I get them by below step, it is right?
|
Describe the bug
During test, we create then delete many pvc and pods(about 180) , every pods will mount one pvc. We repeat the process about 5 hours. We find that the daemonset csi-rbdplugin pod pids.current value keep rising. During our test, the pids.current value is 47 before test, the value increased to 99 after stopping the test.
The ceph-csi version is v3.9.0 and has set pidlimit=-1.
Environment details
v3.9.0
Linux 5.3.18-57-default
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : krbdv1.28.4
v17
Steps to reproduce
Steps to reproduce the behavior:
action=$1
for i in {1..180}
do
cat <<EOF | kubectl $action -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rbd-pvc-$i
spec:
accessModes:
storageClassName: "network-block"
resources:
requests:
storage: 100M
apiVersion: v1
kind: Pod
metadata:
name: test-rbd-$i
spec:
containers:
image: nginx:alpine
ports:
name: www
volumeMounts:
mountPath: /usr/share/nginx/html
volumes:
persistentVolumeClaim:
claimName: rbd-pvc-$i
nodeName: node-xxx
EOF
done
before test:
crictl exec -it 608df5831ddae cat /sys/fs/cgroup/pids/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pode1aea94b_2820_418f_8b33_fd51e946a442.slice/cri-containerd-608df5831ddae1b508434f20edee9b18d58fb76771c5fc4c785f99b024dc56c5.scope/pids.current
47
after test:
crictl exec -it 608df5831ddae cat /sys/fs/cgroup/pids/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pode1aea94b_2820_418f_8b33_fd51e946a442.slice/cri-containerd-608df5831ddae1b508434f20edee9b18d58fb76771c5fc4c785f99b024dc56c5.scope/pids.current
99
Actual results
Describe what happened
Expected behavior
Some pids should be released.
Logs
If the issue is in PVC creation, deletion, cloning please attach complete logs
of below containers.
provisioner pod.
If the issue is in PVC resize please attach complete logs of below containers.
provisioner pod.
If the issue is in snapshot creation and deletion please attach complete logs
of below containers.
provisioner pod.
If the issue is in PVC mounting please attach complete logs of below containers.
csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from
plugin pod from the node where the mount is failing.
if required attach dmesg logs.
Note:- If its a rbd issue please provide only rbd related logs, if its a
cephFS issue please provide cephFS logs.
Additional context
Add any other context about the problem here.
For example:
Any existing bug report which describe about the similar issue/behavior
The text was updated successfully, but these errors were encountered: