Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image request #741

Open
IvanMM27 opened this issue Aug 22, 2024 · 0 comments
Open

Docker image request #741

IvanMM27 opened this issue Aug 22, 2024 · 0 comments
Labels
feature New feature or request

Comments

@IvanMM27
Copy link
Contributor

Dear all,

I was wondering if it will be possible to generate a Dockerfile to generate a GraphNeT docker image and run it inside a container. The idea behind this is that, when running on a container, we have full control of an isolated environment in the case we experience some issues during training/inference and we can stop it without disturbing other processes running on a cluster outside the container.

It happened to me that I stopped a training doing Ctrl+C on the terminal where I was running it, but somehow the GPUs got frozen with ghost processes after the training script was stopped. I tried to stop them manually using kill commands on the terminal, but then, the processes with a given PID appeared as N/A when using nvtop, and when typing nvidia-smi there were not even processes using the GPUs even though they were being used. The next thing I tried was to shut down manually the processes using the GPUs with the next two commands:

  • fuser -v /dev/nvidia*
  • kill $(lsof -t /dev/nvidia*)

The first one of them didn't fully work all of the times whereas the second one did. In the case it didn't work, the cluster in which I was running GraphNeT needed to be rebooted, making it not usable for other co-workers meanwhile...

Therefore, I encouraged GraphNeT developers to re-consider having a Dockerfile as happened in the past. I will be very happy to help with this, but I am not sure I might have all the required knowledge to do it myself alone.

Thank you very much!

@IvanMM27 IvanMM27 added the feature New feature or request label Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant