-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Better headers in --showpids #166
Comments
use both flag: |
Sure, that works if the user already knows that there are two flags. My point is that the way |
Hi @al42and, sorry for the late response.
The column name can be changed from "GPU(s)" to "# of GPUs" to be more clear. However, note that rocm-smi will be deprecated in the future in favor of the new amd-smi tool, so you may want to transition to using amd-smi in your application. In amd-smi, you can get information about the processes using your GPUs with the
Could you elaborate on what's not clear?
What do you mean by old kernel here? Is it a process that has finished running? |
Nice! It's confusing to have multiple tools with different behavior, so deprecating one of the two makes sense. Will
One option says "Show current running KFD PIDs"; the other says "Show GPUs used by specified KFD PIDs (all if no arg given)". I don't see how it should be clear that the first one does not show which GPUs are used but still shows the number of GPUs used. The existence of the second option hints at that, but requiring the user to read that much into nuances is, IMO, not nice.
Linux kernel. Some old version we had on our Cray machine at the time of filing this issue. |
I would expect it to be deprecated by that point but I can't say for sure.
If we're being pedantic, the user shouldn't assume --showpids will include anything other than the current running KFD PIDs as mentioned in the help output. The other columns (like # of GPUs or process name) are supplementary output. If a user is looking for which GPUs are being used by a process, the help output also makes it clear that the appropriate option is --showpidgpus.
Is this issue still present in that old kernel version? |
Should the users assume that Things are improving, and that's the main thing. But, so far, ROCm has been actively discouraging users from reading too much into exact wording of its documentation :)
👍
That old kernel version is no longer present on our machine, so cannot say. |
That is a mistake in the documentation. Thanks for pointing it out, I will get that fixed.
I share your concern. We are actively trying to improve our documentation so if you come across any other mistakes or confusing wording in the docs, let me know or create another ticket and we will have it fixed!
Alright, if you come across it again or remember the kernel version and ROCm version the issue was reproduced on, let me know and I can look into the issue further. |
Suggestion Description
rocm-smi --showpids
reports the number of GPUs used by the process.However, the presentation makes it easy to assume that it shows which GPUs are used.
We are having the users of our application confused, thinking that all the processes run on the same GPU:
Compare this with how nvidia-smi reports the similar thing:
It would be better if
rocm-smi --showpids
output was more clear that it reported the number of GPUs used, not their indices.The help output is also unclear about the differences between the two options:
With an old kernel, when rocm-smi cannot get the information, it is even more confusing: instead of N/A, it reports 0, which can be interpreted either as GPU #0 of that no GPUs are used: neither of that is correct!
Operating System
SLES 15
GPU
MI250X
ROCm Component
rocm_smi_lib
The text was updated successfully, but these errors were encountered: