Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Read only Singularity cache #1879

Closed
ewels opened this issue Feb 3, 2021 · 16 comments · Fixed by #5498
Closed

Feature request: Read only Singularity cache #1879

ewels opened this issue Feb 3, 2021 · 16 comments · Fixed by #5498

Comments

@ewels
Copy link
Member

ewels commented Feb 3, 2021

New feature

Suggestion is to add a new Nextflow environment variable which works alongside $NXF_SINGULARITY_CACHEDIR with one difference: it is assumed to be read-only. It could be called $NXF_SINGULARITY_LIBRARY or something.

Usage scenario

With DSL2 we are relying on $NXF_SINGULARITY_CACHEDIR over at @nf-core much more, as users can no longer run pipelines using -with-singularity. Generally it works great.

On large shared multi-user HPC setups it makes sense to have a shared singularity cachedir for all users, to avoid duplicating many GBs of identical containers (it also makes it easier and faster for users to run pipelines). It's not a good idea to have this world writeable in such situations though. Since #1706 was closed, read-only cache dirs do work and the vision of a shared cachedir is possible 🤩 ('cc @pontus).

The problem is when a user tries to run a pipeline that hasn't already been cached. Then Nextflow will try to pull the required containers and save them to $NXF_SINGULARITY_CACHEDIR and fail if it's read-only. My suggestion is to add a new env var that allows the user to keep their own personal $NXF_SINGULARITY_CACHEDIR which they can write to and also make use of a centralised shared read-only singularity cache.

Suggest implementation

I guess the code doesn't need to be super complicated, just a case of looping through more than one localpath here:

Path downloadSingularityImage(String imageUrl) {
final localPath = localImagePath(imageUrl)
if( localPath.exists() ) {
log.debug "Singularity found local store for image=$imageUrl; path=$localPath"
return localPath
}

Looking at the code we could probably avoid needing to add additional env variables if you prefer if all three of these were checked for existing images instead of just the single top priority one:

* If tries these setting in the following order:
* 1) {@code singularity.cacheDir} setting in the nextflow config file;
* 2) the {@code NXF_SINGULARITY_CACHEDIR} environment variable
* 3) the {@code $workDir/singularity} path

Open to whatever solution you suggest.

@pditommaso
Copy link
Member

It sounds double, only concern is trying to avoid the proliferation of magic env variables. What about adding a option in the config file eg. singularity.cacheReadOnly. ?

@ewels
Copy link
Member Author

ewels commented Feb 3, 2021

Understandable 👍🏻 I'd prefer an env variable if possible though.. In our use case I'm thinking of having @nf-core pipelines set up as environment modules on the cluster, so that people can do module load nf-core-rnaseq on our secure offline and have everything that they need. An env variable is easy to do with this, a nextflow config option is maybe doable but would require workarounds and hacks. Different users will be doing this so can't have it in the ~/.nextflow/config file and would rather avoid having to edit the pipeline source code if possible.

@pditommaso
Copy link
Member

Then we can have a compromise and have both 😄

It would be enough adding a similar method getLibraryDir

Path getCacheDir() {
if( config.pullTimeout )
pullTimeout = config.pullTimeout as Duration
def str = config.cacheDir as String
if( str )
return checkDir(str)
str = env.get('NXF_SINGULARITY_CACHEDIR')
if( str )
return checkDir(str)
str = env.get('SINGULARITY_PULLFOLDER')
if( str )
return checkDir(str)
def workDir = Global.session.workDir
if( workDir.fileSystem != FileSystems.default ) {
throw new IOException("Cannot store Singularity image to a remote work directory -- Use a POSIX compatible work directory or specify an alternative path with the `NXF_SINGULARITY_CACHEDIR` env variable")
}
missingCacheDir = true
def result = workDir.resolve('singularity')
result.mkdirs()
return result
}

@stale
Copy link

stale bot commented Jul 18, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 18, 2021
@ewels
Copy link
Member Author

ewels commented Jul 18, 2021

Still planning to do this, one day (and if not new then would still be great if someone did it!)

@stale stale bot removed the stale label Jul 18, 2021
@pditommaso
Copy link
Member

Was reading this again, I think the real point is having a flag (read env var) that allows enabling/disabling the pull of singularity containers. Agree on that?

@ewels
Copy link
Member Author

ewels commented Jul 19, 2021

I'd prefer it as an additional location: check if the container I need is in the read-only location, fetch it to the write-location if not.. So then needs to be a little more than just a flag.

@pditommaso
Copy link
Member

pditommaso commented Jul 19, 2021

Therefore I may not fully understand your use. You are planning to have two locations for singularity images:

  1. NXF_SINGULARITY_LIBRARY: NF checks this path , if the image exists take, otherwise point 2:
  2. NXF_SINGULARITY_CACHEDIR: NF check if the image exists, yes => run it, no => pull, store in cachedir, run it.

Make sense?

@ewels
Copy link
Member Author

ewels commented Jul 19, 2021

Yes, exactly 👍🏻

So NXF_SINGULARITY_CACHEDIR is unchanged from its current behaviour. NXF_SINGULARITY_LIBRARY is a new additional read-only second location that is checked first, as you say.

@ewels
Copy link
Member Author

ewels commented Aug 10, 2021

ps. Whilst we're here, could even do something similar for $NXF_ASSETS and have a $NXF_ASSETS_LIBRARY. Though in our use case we would like to include the pipeline tag in the directory structure so that multiple versions can coexist, which doesn't currently work with the default assets file path structure.

@pditommaso pditommaso added this to the v21.10.0 milestone Oct 12, 2021
@pditommaso
Copy link
Member

This was implemented by #2381. Regarding the assets directory, not sure it should follow the same pattern. The logic behind is much more complex, we should check the impact of that before.

@ewels
Copy link
Member Author

ewels commented Oct 12, 2021

Sounds good 👍🏻 The assets thing was more of a spur of the moment thought. I'm happy for that to be forgotten for now.

@stevekm
Copy link
Contributor

stevekm commented Nov 7, 2024

hi @ewels @pditommaso I see this is marked as completed and also in the associated ticket #1706

However it sounds as if this is only implemented currently via an environment variable NXF_SINGULARITY_LIBRARYDIR ; is that the case?

This is really problematic for us because its far more difficult (and far less ideal) to police users' environment variables inside the HPC to try to make sure their Nextflow runs are configured correctly to run on our systems. It would be much preferred if this same configuration was possible to be set via a nextflow.config setting. That would allow us to simply provide our users with a supplemental nextflow.config file which they could use with nextflow -c our-settings.config run main.nf ... in order to easily use our systems. Currently I do not see any mention of NXF_SINGULARITY_LIBRARYDIR in the configs for Signularity here;

https://nextflow.io/docs/latest/reference/config.html#singularity

is this the case or would we need a separate Issue to request graduating this to a config option instead of just an env var?

@pontus
Copy link

pontus commented Nov 7, 2024

This seems to be a documentation bug (which is odd, I believe I've seen it there earlier). singularity.libraryDir can be used instead of the environment variable. E.g. https://nextflow.io/docs/stable/container.html also mentions it.

@ewels
Copy link
Member Author

ewels commented Nov 7, 2024

Yup, I'm also pretty sure that the config option exists. @christopher-hakkaart would you mind taking a look at this if you get a second please?

Having some kind of automated checks (or auto-generated docs?) to make sure that every config option is documented would be a great thing to look into once we have a new config parser @bentsherman

@christopher-hakkaart
Copy link
Contributor

Thanks all; I'll add it to my sprint starting next week. If it was available earlier it might have been accidentally dropped when we did the docs restructure. Either way, it should be quick to add it back in. I'll update here with the PR once it's ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants