can we get `champagne run` to work without the need for `init`? #178

kelly-sovacool · 2024-02-19T18:56:32Z

nextflow run nf-core/chipseq -profile test,docker --outdir output just works for nf-core pipelines, without the need to first download anything at all. In theory we should be able to get the same working for our nextflow pipelines so champagne run -profile test ... will just work without initializing anything.

The text was updated successfully, but these errors were encountered:

kelly-sovacool · 2024-02-19T18:57:19Z

init could still be helpful as an optional helper to copy an example samplesheet or params.yml file for users to customize for their needs, but this way it could be optional instead of required.

slsevilla · 2024-02-22T17:48:49Z

offering my thoughts on this- i think (biased- see below) init is crucial for reproducibility and tracking and while its nice to maybe avoid that in dev, i dont think for production it should be optional.

from a historical standpoint, init has evolved from what was just a copy some configs and random files to the outputdir to how it is set up on most pipelines. i've pushed for and added (hence, bias) a lot of the code in our pipelines related to init because of experiences i've had with my own work, and with assisting others running our pipelines. a few thoughts on why it should remain a requirement for production.

(for all users) it saves all configuration metrics and input params with the output directory. this ensures if we go back to the project months down the line, we don't have to dig through log files to know what params were being used (and it's easily labeled so PI's can look it up without asking us). when i go to write manuscripts i always go to the tools json/config file to easily pull all version (or containers) to pop into the methods section. no searching through logs.
(power users) related, running more than one project on the same pipeline, at the same time, makes for dangerous outcomes if you haven't initialized a single source for configs etc.
(for all users) similarly provides defined manifests for users to update, rather than need to recreate on their own. this is the number one source of headache for new users with new pipelines - the manifest file is formatted incorrectly; even with the best documentation.
(power users) copies scripts used directly to the output directly which were run during the creation of the output material. this allows users to perform single-project needed updates to code quickly, while saving these changes directly in the output dir for reproducibility.

all that aside, for dev, this is a super helpful tool, especially if the pipeline has test profiles. in that instance you're just running on what is already set up and defined and dont really care about history. I think it's a perfect use case for that, but not as much for production.

kelly-sovacool · 2024-02-22T19:55:17Z

Currently the init implemented here is substantially smaller / has fewer features than the init of the snakemake workflows. We can definitely satisfy all of the goals/pros you listed, it's just a matter of deciding how that should/could be implemented with our nextflow pipelines.

I think some of the reproducibility goals may be better handled at run time rather than during initialization. We do plan to eventually get all processes to output their software versions just like the nf-core pipelines (see #27) -- in those pipelines that's something handled at run time because the version can change depending on which container is used for a given process.

Config options set at the CLI also can change between runs/reruns in the same directory, so we'll want to make sure we're capturing those at run time also -- I think that actually may already be handled by nextflow's built-in execution report but we should double check, and if not make sure we output a time-stamped file with an exhaustive list of the params used.

I do think copying boilerplate config/params files, example sample sheets, etc are good examples of tasks best handled by init.

kelly-sovacool added the cli Related to the Command Line Interface label Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can we get `champagne run` to work without the need for `init`? #178

can we get `champagne run` to work without the need for `init`? #178

kelly-sovacool commented Feb 19, 2024

kelly-sovacool commented Feb 19, 2024

slsevilla commented Feb 22, 2024

kelly-sovacool commented Feb 22, 2024

can we get champagne run to work without the need for init? #178

can we get champagne run to work without the need for init? #178

Comments

kelly-sovacool commented Feb 19, 2024

kelly-sovacool commented Feb 19, 2024

slsevilla commented Feb 22, 2024

kelly-sovacool commented Feb 22, 2024

can we get `champagne run` to work without the need for `init`? #178

can we get `champagne run` to work without the need for `init`? #178