Accessing workflow final output #18

hbredin · 2016-03-02T13:19:23Z

I plan to use hyperopt hyper-parameter optimization toolkit in combination with sciluigi workflows.

Instead of doing exhaustive grid search, hyperopt will (smartly) propose a set of parameters params that I use like this:

task = MyWorkflow(**params)
luigi.build([task], local_scheduler=True)

In my current setup, the final task of the workflow prints the value of the objective into a file.
Therefore, I have to read this file again to let hyperopt know what is the value of the objective for the current set of parameters. It means that I have to know exactly where the workflow will save this file.

I expected task.output() to return the output of the final task but instead it returns a dict with audit and log values. Is there an easy way to access the output of the final task of the workflow?

Even better, is there a way to make the workflow returns the value of the objective directly?
objective = f(task)

The text was updated successfully, but these errors were encountered:

samuell · 2016-03-02T17:04:46Z

Hi @hbredin! Interesting use case, it sounds very similar to what we are doing with machine learning in drug discovery / cheminformatics!

It sounds like you are basically looking for using the workflow task as a "subworkflow", to be a part of a larger workflow. Is that correct?

We have plans (See this issue) to implement sub-workflow support in SciLuigi. It should not be a big development but we simply haven't got onto it just yet, as we have managed to do without it so far. But now that there is more need for it, we should probably look into implementing this.

hbredin · 2016-03-05T14:01:54Z

Indeed, it would be great to have sciluigi.Workflow implement the standard luigi.Task interface:

.requires(self)
.run(self)
.output(self)

But, really, what I am looking for right now is a standard .output() method that would return the same thing as luigi.Tasks do -- so that I can use workflow.output().path.

hbredin · 2016-03-17T13:07:06Z

FYI, I ended up adding a dummy Hyperopt task at the end of my workflow that write the path of the final task to a temporary file whose path (temp) is provided as parameter:

class Hyperopt(sciluigi.Task):

    temp = luigi.Parameter()
    in_final = None

    def out_put(self):
        return sciluigi.TargetInfo(self, self.temp)

    def run(self):
        with self.out_put().open('w') as fp:
            fp.write(self.in_final().path)

class MyWorkflow(sciluigi.WorkflowTask):

    hyperopt = luigi.Parameter()

    def workflow(self):
        ...
        final_task = self.new_task(...)
        # return final_task

        hyperopt = self.new_task('hyperopt', Hyperopt, temp=self.hyperopt)
        hyperopt.in_final = final_task.out_put
        return hyperopt

Then, to get the path of the output of the final task, here is what I have to do:

# create path to temporary file and add it to the set of parameters
directory = mkdtemp()
params['hyperopt'] = directory + '/hyperopt'

# actually run the workflow with this extended set of parameters
task = MyWorkflow(**params)
luigi.build([task], local_scheduler=True)

# obtain the path of final task output from the temporary hyperopt file
with open(args['hyperopt'], 'r') as fp:
    path = fp.read()

# do something with path...

I will now try to write a decorator for any sciluigi.WorkflowTask that does this change automatically.

hbredin mentioned this issue Mar 22, 2016

Automatic output path generation #19

Open

ddd-bbb mentioned this issue Nov 16, 2017

Add sub-workflow support #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accessing workflow final output #18

Accessing workflow final output #18

hbredin commented Mar 2, 2016

samuell commented Mar 2, 2016

hbredin commented Mar 5, 2016

hbredin commented Mar 17, 2016

Accessing workflow final output #18

Accessing workflow final output #18

Comments

hbredin commented Mar 2, 2016

samuell commented Mar 2, 2016

hbredin commented Mar 5, 2016

hbredin commented Mar 17, 2016