-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concerns about modules stability #2
Comments
Figured out where it came from: the python version of the base image for Docker changed from 3.9 to 3.8 Shouldn't Lmod automatically install the required python version as a module if it needs it for a specific package? |
OK, you figured it out well. Python is an issue I have not been able to overcome tbh in this kind of environment. If you are using a JupyterLab container image, it requires Python from the start of course. So you already are in a specific Python environment, and depending on how you build this image, maybe even in a virtualenv. So even if you load a Python module with a different Python version, this one is not taken into account by the environment (even if you reload the kernel because you already are in a specific Python version environment).
So at this point, all my module builds are based on Python 3.9, and the container images I'm using have to be based on the same 3.9 version. |
Thanks a lot for such a detailed answer @guimou ! The challenges are much clearer now So by running modules built on Fedora in a Debian based notebook I might also face some compatibility issues? (probably not on all modules though since the env must be relatively similar, explaining why I did not noticed it before) If I understand well: to run EasyBuild and Lmod (to build and serve the modules), we need to install packages like GCC, Lua and python (e.g. 3.9), which will create conflict if we try to run modules that are using different versions (e.g. python 3.8)
To avoid this EESSI installed those GCC and python packages somewhere where the computer don't expect to find them (using Gentoo Prefix) So that's what we need to do too in a container? e.g. start from a bare linux image, then install GCC and python in a "secret place" to avoid our linux OS to use it. Then use this place to setup lmod/easybuild. Did I get it right? For jupyterlab we can easily load the module in the docker build I guess, but for GCC and python that will be another level Could not we reuse the docker image EESSI uses in singularity? https://github.com/EESSI/compatibility-layer/blob/main/Dockerfile.bootstrap-prefix-centos8 it's centos based (8 is maybe not the most ideal since centos8 already ends of life at the end of this year :p ) Otherwise an option I see would be compiling GCC and python from source using A more container friendly solution could be to do a multistage build to install Lmod/EasyBuild, then move to a bare linux image and copy the preinstalled Lmod to the final image (does Lmod need GCC installed to run, or only for building it?) We plan to get in touch with EESSI, maybe they'll have some good advice on this |
You almost got everything right. |
Nice, indeed the fact that we are in a container brings some basic assumptions on how we will be starting it We could just start a bare linux without JL running, and then the user starts it himself... But from our users point of view it would be a bit weird to do it this way, and not convenient We could also use something like From a user point of view, I think the easiest would just to have a different base image for all major GCC/Python version they need. So they just start the right image, and then they can install all modules fitting this image (a bit like you do... But multiple times for multiple version) But it will create a lot of work on the maintainers side: build probably 3 to 5 images, then as many EasyBuild images with the modules corresponding for each version (from my experience running eb to install modules can take a lot of time) In any cases, thanks a lot for those explanations and pointers, that was really instructive! |
Yes and no, because then you have to figure out how your user will access the container. You can always rsh into it, but then JL will be run under a specific session that you'll have to maintain open. Other solution would be to start a basic interface, some kind of spawner where you can choose you JL version. But in this case, why not doing from the start and spawn the JL the user wants.
Well, in fact you can have one EasyBuild (the "builder" instance) building both Py38 and Py39 modules. But yes, you need to have two "lines" of modules. Or make some choices/prioritization. TF up to version xx is available only in the Py38 line, then for both up to version yy, then only for Py39.
My pleasure! As this project seems to get some interest, I'll maybe create some Slack channel to have other interesting conversations like this. |
Hi again, I have some more questions about Lmod and EasyBuild, and some concerns that start to arise as I am discovering this technology (note that I am not a sysadmin originally, more of a dev who don't want to rely on sysadmins to get things done, so my point of view and questions might be a)
The error
Here's how it worked for me:
easybuild-data
volume is still hereThe problem also is that here the error is quite unreadable (related to path loading arcanes of lmod), so it can't be just read and solved with regular computer knowledge. Usually when I get issues with any type of language/package using a regular package manager in a Docker image I can always find my way to solve it in a few minutes just because I know basics of bash and Unix filesystem.
The concerns
From a developer point of view, the major advantage of using (docker) containers to handle dependencies has always been: it's stable, there are no surprise (people hates surprises!). And it just requires some basic Unix/Linux knowledge (that are required elsewhere anyway). You pull, you run, it works. No surprise, no additional work
But Lmod does not seems to provide this level stability. It seems to require ad hoc fixes for various packages installed. And each time something fails it can't be fixed easily: you need to know perfectly the whole lmod/easyBuild mechanisms, go through a complex system of path loading dependencies
For example I faced an issue where RStudio was complaining about permissions in
/var/run/rstudio-server
, and I noticed that you defined the env variableUSER=rstudio-server
in the jupyterlab Dockerfile: https://github.com/guimou/s2i-lmod-notebook/blob/main/f34/Dockerfile#L51 was it for this reason?The questions
Any idea why Lmod modules would fail like this ? (I am looking into what I might have added to the jupyterLab image that could have created this, but maybe someone got an idea already :) )
What does it honestly takes to make a Lmod/EasyBuild environment stable enough that it can be trustfully used by scientists doing research? Is it completely automatic, magic and stable once you found the right setup? Or does it requires someone to regularly get his hand in the system, help the researchers to fix modules not loading properly, or fixing the EasyConfig for this package?
Here are the error output for tensorflow failing to import (
import tensorflow as tf
):The text was updated successfully, but these errors were encountered: