Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't install unlock services #5

Open
mkuznetsov opened this issue Apr 19, 2024 · 4 comments
Open

Can't install unlock services #5

mkuznetsov opened this issue Apr 19, 2024 · 4 comments

Comments

@mkuznetsov
Copy link

Proxmox 8.1
Nvidia Tesla P4
Installed last version on 16.x branch 535.161.08 (because 17.x don't support Pascal based cards)
nvidia-smi runs and return normal output
mdevctl types return empty string

script created:
/etc/systemd/system/nvidia-vgpud.service.d/vgpu_unlock.conf
/etc/systemd/system/nvidia-vgpu-mgr.service.d/vgpu_unlock.conf
With same content:
[CODE]
[Service]
Environment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so
[/CODE]
failed to enable services:
root@pve:/opt/vgpu_unlock-rs# systemctl enable nvidia-vgpud.service
Failed to enable unit: Unit file nvidia-vgpud.service does not exist.
root@pve:/opt/vgpu_unlock-rs# systemctl enable nvidia-vgpud-mgr.service
Failed to enable unit: Unit file nvidia-vgpud-mgr.service does not exist.

It needed files to enable service, without them service don't start
/etc/systemd/system/nvidia-vgpud.service
/etc/systemd/system/nvidia-vgpu-mgr.service

Simple coping of config don't work, because no [Install] section.
I add it and it become something like this :
[CODE]
[Service]
Environment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so

[Install]
WantedBy=multi-user.target
[/CODE]
After that services formally was installed but not working

systemctl enable nvidia-vgpud.service
Created symlink /etc/systemd/system/multi-user.target.wants/nvidia-vgpud.service → /etc/systemd/system/nvidia-vgpud.service.
systemctl enable nvidia-vgpu-mgr.service
Created symlink /etc/systemd/system/multi-user.target.wants/nvidia-vgpu-mgr.service → /etc/systemd/system/nvidia-vgpu-mgr.service.

Getting status on services return error:
[CODE]
systemctl status nvidia-vgpu-mgr
Warning: The unit file, source configuration file or drop-ins of nvidia-vgpu-mgr.service changed on disk. Run 'systemctl daemon-reload' to reload units.
○ nvidia-vgpu-mgr.service
Loaded: bad-setting (Reason: Unit nvidia-vgpu-mgr.service has a bad unit file setting.)
Drop-In: /etc/systemd/system/nvidia-vgpu-mgr.service.d
└─vgpu_unlock.conf
Active: inactive (dead)

Apr 19 18:25:52 pve systemd[1]: nvidia-vgpu-mgr.service: Service has no ExecStart=, ExecStop=, or SuccessAction=. Refusing.
[/CODE]

@wvthoog
Copy link
Owner

wvthoog commented Apr 28, 2024

You don't need to install the unlock services since the Tesla P4 is supported natively. Download the new script and it will work (if I've done my research correctly)

@mkuznetsov
Copy link
Author

mkuznetsov commented Apr 29, 2024 via email

@wvthoog
Copy link
Owner

wvthoog commented Apr 30, 2024

that's right, the P4 should be supported by one of the 16.x drivers. Couldn't tell from the Nvidia website which one (so you'd have to try that out yourself)

Incorporating the profile overrides was a consideration, but opted not to integrate it for version 1.1 of the script. Due to time constraints and the huge block of code it would add. Buit that wouldn't be of use to you because you're running the native driver without vgpu_unlock patches. There is one option though to do patch the driver and then add this to your TOML profile override:
echo "unlock = false" > /etc/vgpu_unlock/config.toml

Than you can use the P4 natively and still use custom profiles i believe.

Maybe I'll add it in a later version of the script

@mkuznetsov
Copy link
Author

I get a very strange results in llm runs. I choose B class vgpu. make overriding
and add "cuda_enabled=1" to profile

in gta5 on one vgpu I get utilisation about 16-18% and on second vgpu can't get utilisation higher than 3-4% by llm models in ollama. Can vgpu unlock somehow affect computational abbilities?

second question - host driver 535.104 support cuda version 16.1 but vgpu provided only 12.2 to vm. can it be patch related?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants