Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu 20.04 GPU Image Rewrite, master branch (2024.05.19.) #99

Merged
merged 1 commit into from
May 27, 2024

Conversation

krasznaa
Copy link
Member

This is a fairly big design change that I'm proposing here. I'd like to switch to having all of the "GPU images" be based on the "vanilla image", just putting the GPU SDKs on top of that base, vanilla image.

There are a number of reasons behind all of this.

Note that I only went with a "single layer" of image inheritance in the end. So, the CUDA+oneAPI and ROCm+oneAPI images are built on top of the same "vanilla" Ubuntu 20.04 base image that the "non-hybrid" images are also built on top of. This is for 2 reasons:

  • Whenever adding changes to the Ubuntu 20.04 image, we'll already have to deploy the updated images in 2 steps. (Since we'll need to tag the new base image before we could build and tag the new derived images.) Didn't want to make it into a 3 step process.
  • This allows us to use a newer version of ROCm, CUDA or oneAPI on their own, than in the "combined images". This is absolutely needed, already with the current ROCm+oneAPI image.

Also note that I went with oneAPI 2024.1 as the oneAPI version in all the images. It is not actually functional with the current traccc main branch just yet. But with acts-project/traccc#591 and acts-project/detray#734 having gone in already, by the time that we get a new tag of this repository, we'll be able to use these images. 😉

Finally: Note that I didn't add the Codeplay plugins to the "combined images". But this doesn't actually prevent us from testing the build of SYCL code with an NVIDIA or AMD backend. It just means that we can't (currently) runtime test the binaries using these images. 😦 I intend to figure out later on how to install these plugins, but for this first PR it didn't seem absolutely necessary. (Unfortunately the download of the pre-built Codeplay binaries cannot be automated at the moment.)

Also pinging @beomki-yeo and @stephenswat for info.

@krasznaa krasznaa added the enhancement New feature or request label May 19, 2024
@krasznaa krasznaa requested a review from paulgessinger May 19, 2024 13:45
@paulgessinger
Copy link
Member

  • Is it intentional that these images don't have any of the dependencies like G4 anymore?
  • Can you check the total size of these images?

@krasznaa
Copy link
Member Author

  • Is it intentional that these images don't have any of the dependencies like G4 anymore?

They all are built currently on top of ghcr.io/acts-project/ubuntu2004:v43. ROOT, G4, etc. all come from that. I should've been clearer by what I meant by "vanilla" image. 😉

  • Can you check the total size of these images?

I'll do that in not too long, once I reboot into Linux again. 😛 The CI error because of running out of disk space is indeed worrisome. 😦 Even if the "new" images are not meant to be smaller / larger than the existing ones. 🤔

@paulgessinger
Copy link
Member

  1. Makes sense
  2. These can easily balloon in size. The GitHub runners max out at 20 or so GB

@krasznaa
Copy link
Member Author

So, yeah, I should've checked the size of rocm-hip-sdk6.1.1 more carefully. 😦

...
 => exporting to image                                                                                                      51.5s
 => => exporting layers                                                                                                     51.5s
 => => writing image sha256:667b73eed29ece4ceed6fb60816bc9a70f6c71ba708467cb89f3e4dc7874549f                                 0.0s
 => => naming to docker.io/library/ubuntu2004_rocm:test                                                                      0.0s
[bash][Legolas]:machines > docker images
REPOSITORY        TAG       IMAGE ID       CREATED         SIZE
ubuntu2004_rocm   test      667b73eed29e   2 minutes ago   25.3GB
[bash][Legolas]:machines >

This is of course bananas... Let me see what to do. Either giving up on using this very latest version, or (probably) being more selective in what I would install into the image. 🤔

They are now all based on the "latest" vanilla Ubuntu 20.04 image,
just adding GPU SDKs on top. From DEB repositories in all cases.
@krasznaa krasznaa force-pushed the Ubuntu20.04GPUUpdates-master-20240519 branch from 44f2b7a to 0d81bd2 Compare May 19, 2024 16:36
@krasznaa
Copy link
Member Author

I now switched to installing rocm-hip-runtime-dev instead of rocm-hip-sdk. This seems to be enough for what we need from these images, and spares us the installation of AMD's blas, fft, etc. libraries. Which really grew huge in the latest versions as it seems...

@krasznaa
Copy link
Member Author

I should add: With this setup, the full (uncompressed) size of ubuntu2004_rocm_oneapi is ~10 GB. From this ROCm/HIP and oneAPI are both 2-3 GBs each.

Not brilliant, but I don't think we can easily make it smaller. 🤔

@krasznaa krasznaa merged commit df7a9c0 into master May 27, 2024
20 checks passed
@krasznaa krasznaa deleted the Ubuntu20.04GPUUpdates-master-20240519 branch May 27, 2024 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants