Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coffea 2024 and coffea 0.7.x images #1

Merged
merged 30 commits into from
Feb 16, 2024

Conversation

oshadura
Copy link
Member

@oshadura oshadura commented Feb 1, 2024

Thanks @nsmith- for starting point: https://github.com/CoffeaTeam/docker-coffea-base/blob/coffea2023

The goal is to merge https://github.com/CoffeaTeam/docker-coffea-base and https://github.com/CoffeaTeam/docker-coffea-dask repositories and have the possibility to maintain both coffea versions separately (CalVer and legacy 0.7.x) as separate images.

I propose to name coffeateam/coffea-basev0 as legacy 0.7.x and coffeateam/coffea-dask the calver version. I added also two versions of each image, depending on linux distro: coffeateam/coffea-basev0-ubuntu, coffeateam/coffea-basev0-alma8 and coffeateam/coffea-dask-ubuntu, coffeateam/coffea-dask-alma8.

The next step would be to slim the image and add ML learning libraries as separate image (?).

@nsmith- nsmith- self-requested a review February 1, 2024 16:10
@oshadura oshadura force-pushed the add-skeleton-images branch from f87d9c3 to 84e13f9 Compare February 1, 2024 16:11
@oshadura oshadura marked this pull request as draft February 1, 2024 16:14
@nsmith-
Copy link
Member

nsmith- commented Feb 1, 2024

Can I suggest a slightly different directory structure?

	renamed:    alma8/Dockerfile -> alma8.dockerfile
	renamed:    alma8/environment.yaml -> environment.yaml
	renamed:    ubuntu/Dockerfile -> ubuntu.dockerfile
	deleted:    ubuntu/environment.yaml

so that we have a common environment.yaml for all OS versions? The CI file will need a tiny change as well

@oshadura
Copy link
Member Author

oshadura commented Feb 1, 2024

@nsmith- sure! I will try tomorrow before you are awake. Any other suggestions? Maybe we need some other flavor? What should we do with ML libraries? Add additional environment.yaml?

@nsmith-
Copy link
Member

nsmith- commented Feb 1, 2024

Regarding image dependencies and naming, personally I'm in favor of something like

  • coffea-base
    • coffea calver
    • dask (not distributed)
  • coffea-dask-v0
    • coffea 0.7.x
    • dask
    • distributed
  • coffea-lpc (FROM coffea-base)
    • distributed
    • lpcjobqueue
  • coffea-eaf (FROM coffea-base)
    • distributed
    • other stuff for FNAL EAF?
  • coffea-casa (FROM coffea-base)
    • distributed
    • dask-gateway
    • any extra casa stuff
  • coffea-casa-v0 (FROM coffea-dask-v0)
    • distributed
    • any extra casa stuff
  • coffea-vine (FROM coffea-base)
    • taskvine
    • whatever else

and potentially other AF variants. @btovar @mapsacosta @mcremone @rpsimeon34 may have opinions as well.
I don't see a strong case for having multiple base OS images. Maybe I am missing something but since our ~whole stack is coming from conda the base OS is somewhat irrelevant as far as I can tell?

@nsmith-
Copy link
Member

nsmith- commented Feb 1, 2024

This does leave up in the air where the ML packages go in the stack. Part of me thinks it should be AF-dependent, so that AFs that support GPUs can ensure they have the correct CUDA versions.

@nsmith-
Copy link
Member

nsmith- commented Feb 1, 2024

This may also be of interest to @kondratyevd

@kondratyevd
Copy link

Thanks @nsmith- - at Purdue AF we provide only a single Docker image for users, and manage environments using conda, so probably at the moment this is not super relevant for us, unless I'm missing some important caveats

@nsmith-
Copy link
Member

nsmith- commented Feb 1, 2024

at Purdue AF we provide only a single Docker image for users

can you describe/link that image? That implies users' full conda environments must be shipped to workers?

@kondratyevd
Copy link

kondratyevd commented Feb 1, 2024

The image is here: https://github.com/PurdueAF/purdue-af/blob/master/jupyterhub/docker/cmsaf-alma8/Dockerfile

To use conda envs with Dask, users must place their environments into a shared storage volume (Purdue's "Depot" storage is standard work area for local Purdue users, it is mounted to all worker nodes; and we are working on a similar solution for external users). Then, the path to conda env must be specified in Dask Gateway setup.

For local Python imports it's similar - user's files should be in shared storage and PYTHONPATH can be modified in Dask Gateway setup to point to the user's analysis framework.

@nsmith-
Copy link
Member

nsmith- commented Feb 1, 2024

Thanks for the input @kondratyevd, it is also in line with some discussion I had with Burt about the fact that the user server (notebook/lab) image often needs extensive AF customization and essentially they are only interested in conda install coffea or using a common environment.yaml file for optional extras (most notably, xrootd). This makes me wonder if we are better off supplying just a set of conda environment files? For workers an image may be useful because it can be cached more effectively.

@kondratyevd
Copy link

Doesn’t this (set of conda files) already mostly provided with just a few optional dependencies? I think our users didn’t have any issues with just conda install coffea==0.7.21 and similarly for v2023/2024. Xrootd is the only exception that I remember.

@oshadura oshadura force-pushed the add-skeleton-images branch 3 times, most recently from c742a37 to 8f46ade Compare February 9, 2024 13:58
@oshadura oshadura force-pushed the add-skeleton-images branch 4 times, most recently from afee57c to 57a8ee1 Compare February 9, 2024 14:56
@oshadura oshadura force-pushed the add-skeleton-images branch from d4f8d4f to 489c723 Compare February 13, 2024 13:49
@oshadura oshadura force-pushed the add-skeleton-images branch from 489c723 to 4e7609e Compare February 13, 2024 15:28
@oshadura oshadura force-pushed the add-skeleton-images branch from e139352 to 7fd5c48 Compare February 14, 2024 15:12
@oshadura
Copy link
Member Author

@nsmith- @lgray Now only one combination is failing: python 3.11 + coffea 0.7.22 with:

2024-02-15 16:26:20,860 - distributed.worker - WARNING - Compute Failed
Key:       MyProcessor-31e87f80bfa2fb4d21e30f667186a569
Function:  MyProcessor
args:      ((WorkItem(dataset='dimuon', filename='https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dimuon.root', treename='Events', entrystart=0, entrystop=40, fileuuid=b'\xa2\x10\xa3\xf86H\x11\xea\xa2\x9f\xf5\xb5\\\x90\xbe\xef', usermeta={}), b'\x04"M\x18h@-\x00\x00\x00\x00\x00\x00\x00v,\x00\x00\x00R\x80\x05\x95"\x00\x01\x00\xf0\x13\x8c\x0btest_dimuon\x94\x8c\x0bMyProcessor\x94\x93\x94)\x81\x94.\x00\x00\x00\x00'))
kwargs:    {}
Exception: 'TypeError("__class__ assignment: \'NanoEventsArray\' object layout differs from \'Array\'")'

https://github.com/CoffeaTeam/af-images/actions/runs/7918852148/job/21618260586

@oshadura
Copy link
Member Author

Otherwise, I can disable it for now, and then if everything else is ok, we can finally push these images.

@oshadura oshadura marked this pull request as ready for review February 16, 2024 09:23
@oshadura
Copy link
Member Author

I am going to fix the failing image in the next pull request as well as add other combinations. The goal is to test the first batch of images on coffee-casa already today.

@oshadura oshadura merged commit 33ad9fd into CoffeaTeam:main Feb 16, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants