We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This issue happens when activating volcano-vgpu, it happens when GPU node not report devices to node annotations properly.
NPE happens for vc-scheduler, related logs:
E1225 02:12:26.904890 112 node_info.go:389] "Idle resources turn into negative after allocated" nodeName="vm-node245-vgpu" task="rise-vast-system/yunji-deployment-1-5878fc9b8b-fbvm8" resources=["nvidia.com/gpu"] idle="cpu 39225.00, memory 192033869059.00, ephemeral-storage 1097689833472000.00, pods 105.00, nvidia.com/gpu -1000.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, volcano.sh/vgpu-number 20000.00" req="cpu 0.00, memory 0.00, nvidia.com/gpu 1000.00, pods 1.00" E1225 02:12:26.906417 112 node_info.go:298] "Node out of sync" name="vm-node245-vgpu" resources=["nvidia.com/gpu"] E1225 02:12:26.926206 112 node_info.go:389] "Idle resources turn into negative after allocated" nodeName="vm-node245-vgpu" task="rise-vast-system/yunji-deployment-1-5878fc9b8b-fbvm8" resources=["nvidia.com/gpu"] idle="cpu 39350.00, memory 192139269379.00, nvidia.com/gpu -1000.00, hugepages-1Gi 0.00, volcano.sh/vgpu-number 20000.00, ephemeral-storage 1097689833472000.00, pods 107.00, hugepages-2Mi 0.00" req="cpu 0.00, memory 0.00, nvidia.com/gpu 1000.00, pods 1.00" W1225 02:12:26.926590 112 node_info.go:336] received argument of nil node, no need to set other resources for W1225 02:12:26.926703 112 node_info.go:231] the argument node is null. W1225 02:12:26.973399 112 node_info.go:336] received argument of nil node, no need to set other resources for W1225 02:12:26.973484 112 node_info.go:231] the argument node is null. W1225 02:12:26.974207 112 node_info.go:336] received argument of nil node, no need to set other resources for W1225 02:12:26.974251 112 node_info.go:231] the argument node is null. E1225 02:12:26.974949 112 panic.go:261] "Observed a panic" panic="runtime error: invalid memory address or nil pointer dereference" panicGoValue="\"invalid memory address or nil pointer dereference\"" stacktrace=< goroutine 521 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x28f6f98, 0x3ef8500}, {0x21f1ea0, 0x3e5f270}) /root/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:107 +0xbc k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x28f6f98, 0x3ef8500}, {0x21f1ea0, 0x3e5f270}, {0x3ef8500, 0x0, 0x10000c0003345d0?}) /root/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:82 +0x5e k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003345d0?}) /root/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:59 +0x108 panic({0x21f1ea0?, 0x3e5f270?}) /root/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:770 +0x132 volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/vgpu.NewGPUDevices({0xc000012d10, 0xa}, 0xc001010308) /root/volcano/volcano_metrics/volcano/pkg/scheduler/api/devices/nvidia/vgpu/device_info.go:99 +0xe6 volcano.sh/volcano/pkg/scheduler/api.(*NodeInfo).setNodeOthersResource(0xc00094a0c0, 0xc001010308) /root/volcano/volcano_metrics/volcano/pkg/scheduler/api/node_info.go:341 +0xcc volcano.sh/volcano/pkg/scheduler/api.(*NodeInfo).setNode(0xc00094a0c0, 0xc001010308) /root/volcano/volcano_metrics/volcano/pkg/scheduler/api/node_info.go:355 +0x93 volcano.sh/volcano/pkg/scheduler/api.(*NodeInfo).SetNode(0xc000e280c0, 0xc001010308) /root/volcano/volcano_metrics/volcano/pkg/scheduler/api/node_info.go:320 +0x51 volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).AddOrUpdateNode(0xc000428dc8, 0xc001010308) /root/volcano/volcano_metrics/volcano/pkg/scheduler/cache/event_handlers.go:497 +0x10a volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).SyncNode(0xc000428dc8, {0xc000808410, 0xa}) /root/volcano/volcano_metrics/volcano/pkg/scheduler/cache/event_handlers.go:618 +0x4cc volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).processSyncNode(0xc000428dc8) /root/volcano/volcano_metrics/volcano/pkg/scheduler/cache/cache.go:1218 +0x1dc volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).runNodeWorker(...) /root/volcano/volcano_metrics/volcano/pkg/scheduler/cache/cache.go:1200 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000318ed0, {0x28d50c0, 0xc00100e000}, 0x1, 0xc000163c20) /root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000318ed0, 0x0, 0x0, 0x1, 0xc000163c20) /root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(...) /root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161 created by volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).Run in goroutine 1 /root/volcano/volcano_metrics/volcano/pkg/scheduler/cache/cache.go:806 +0x8f
latest
No response
The text was updated successfully, but these errors were encountered:
Related PR: #3924
Sorry, something went wrong.
No branches or pull requests
Description
This issue happens when activating volcano-vgpu, it happens when GPU node not report devices to node annotations properly.
Steps to reproduce the issue
Describe the results you received and expected
NPE happens for vc-scheduler, related logs:
What version of Volcano are you using?
latest
Any other relevant information
No response
The text was updated successfully, but these errors were encountered: