-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic output path generation #19
Comments
The question that ensues is somehow related to #18 (rather a generalization of it). Is there an easy way to access the output of all tasks constituting the workflow. def getAllOutputs(workflow):
outputs = {}
for instance_name, task in six.iteritems(workflow._tasks):
outputs[instance_name] = task.out_put().path
return outputs
workflow = MyWorkflow()
outputs = getAllOutputs(workflow) But it looks like, at this point, tasks constituting the workflow ( What does |
(Hi, sorry, have been a bit busy, will look at this now!) |
This is hard to say without a concrete example. We have had cases where we often have multiple outputs, so it has been central for us to give each output a unique name and thus "identity". In cases where we had a single output, we have kept with the same pattern and tried to give a descriptive name, such as
Yea, without knowing this for sure in this case without testing, I often found problems with the fact that Luigi separates scheduling and workflow execution in two phases, and so tasks are not fully instantiated until the scheduling phase is finished and the execution started. Our biggest problem with this is that makes it hard for example to initiate a new task with parameter values calculated by a previous task, since parameter values need to be provided at scheduling time, and scheduling time is over after the execution starts. As a side note, this is one reason why we are experimenting with a fully dataflow-based approach in scipipe, where scheduling and execution can happen interchangeably (but it's not production ready yet).
Will have to test a little before getting back on this, and the other remaining questions. Will get back to you shortly! |
FYI, I ended up saving every automagically generated output paths in an attribute of the parent workflow: |
I had difficulty finding a good naming convention for all my
out_xxxx
path, when my workflow would become complicated (e.g. with one task taking three other tasks as input: how should I name its output?)Therefore, I have created a
sciluigi.Task
mixin calledAutoOutput
that would automatically add anout_put
method to a task (see below). Maybe it can be useful for others...All you have to do to use it is the following:
workdir
luigi.Parameter
to theWorkflowTask
AutoOutput
mixin to the task you are adding to the workflowIt does have a few limitations, the main one being that it does not support tasks with structured inputs.
This will work:
This will not work:
Here is the code of the
AutoOutput
mixin:The text was updated successfully, but these errors were encountered: