Add Dockerfile for easy deployment on NVIDIA systems #34

erinaldiq · 2023-09-28T05:52:59Z

I think this is for @NathanCQC
It would be good to have a Dockerfile that simplifies the deployment on NVIDIA systems, in particular considering parallel MPI environments (multi-node and multi-GPU).

yapolyak · 2023-09-28T07:19:44Z

But Dockerfiles won't help with MPI environments - MPI libraries depend on the lower-level libraries installed on specific HPCs, that are configured for local hard- and firm-ware.

erinaldiq · 2023-09-28T07:27:24Z

You need to install mpi4py and have the right build flags as far as I remember from Nathan. It wasn't a painless process I believe.
Happy to know what other options you have for running on multi-node systems.

yapolyak · 2023-09-28T08:12:57Z

I don't think this will work for a general case, as you need MPI libraries built first, and configuration/installation of those is hardware-dependent. I hence don't believe there exists a containerised solution to at least MPI-based (and maybe any distributed) calculations.

However, @NathanCQC indeed used https://docs.nersc.gov/development/shifter/ on Perlmutter, but I am not sure it will work for arbitrary multi-node system.

I am happy to be wrong, though.

NathanCQC · 2023-09-28T08:30:33Z

We could do this via the CI.

But there is one already built somewhere. But it is specific for Perlmutter. as they have special MPICH.

IMO containerisation on HPC is still very system specific, although I think it is improving. and is not true to the docker ideas about portability etc,

Did you have in mind a specific hardware provider?

erinaldiq · 2023-09-28T08:32:18Z

I had in mind Perlmutter at the moment.

yapolyak · 2023-09-29T10:40:38Z

@NathanCQC what do you mean by doing this via the CI?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dockerfile for easy deployment on NVIDIA systems #34

Add Dockerfile for easy deployment on NVIDIA systems #34

erinaldiq commented Sep 28, 2023

yapolyak commented Sep 28, 2023 •

edited

Loading

erinaldiq commented Sep 28, 2023

yapolyak commented Sep 28, 2023

NathanCQC commented Sep 28, 2023

erinaldiq commented Sep 28, 2023

yapolyak commented Sep 29, 2023

Add Dockerfile for easy deployment on NVIDIA systems #34

Add Dockerfile for easy deployment on NVIDIA systems #34

Comments

erinaldiq commented Sep 28, 2023

yapolyak commented Sep 28, 2023 • edited Loading

erinaldiq commented Sep 28, 2023

yapolyak commented Sep 28, 2023

NathanCQC commented Sep 28, 2023

erinaldiq commented Sep 28, 2023

yapolyak commented Sep 29, 2023

yapolyak commented Sep 28, 2023 •

edited

Loading