Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dockerfile for easy deployment on NVIDIA systems #34

Open
erinaldiq opened this issue Sep 28, 2023 · 6 comments
Open

Add Dockerfile for easy deployment on NVIDIA systems #34

erinaldiq opened this issue Sep 28, 2023 · 6 comments

Comments

@erinaldiq
Copy link
Collaborator

I think this is for @NathanCQC
It would be good to have a Dockerfile that simplifies the deployment on NVIDIA systems, in particular considering parallel MPI environments (multi-node and multi-GPU).

@yapolyak
Copy link
Contributor

yapolyak commented Sep 28, 2023

But Dockerfiles won't help with MPI environments - MPI libraries depend on the lower-level libraries installed on specific HPCs, that are configured for local hard- and firm-ware.

@erinaldiq
Copy link
Collaborator Author

You need to install mpi4py and have the right build flags as far as I remember from Nathan. It wasn't a painless process I believe.
Happy to know what other options you have for running on multi-node systems.

@yapolyak
Copy link
Contributor

I don't think this will work for a general case, as you need MPI libraries built first, and configuration/installation of those is hardware-dependent. I hence don't believe there exists a containerised solution to at least MPI-based (and maybe any distributed) calculations.

However, @NathanCQC indeed used https://docs.nersc.gov/development/shifter/ on Perlmutter, but I am not sure it will work for arbitrary multi-node system.

I am happy to be wrong, though.

Copy link
Collaborator

We could do this via the CI.

But there is one already built somewhere. But it is specific for Perlmutter. as they have special MPICH.

IMO containerisation on HPC is still very system specific, although I think it is improving. and is not true to the docker ideas about portability etc,

Did you have in mind a specific hardware provider?

@erinaldiq
Copy link
Collaborator Author

I had in mind Perlmutter at the moment.

@yapolyak
Copy link
Contributor

@NathanCQC what do you mean by doing this via the CI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants