You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using sciluigi in classification experiments where i would like to have one task per model trainer. The number of model trainer is defined by a list of categories/labels for which models need to be trained which is defined by a dataset descriptor file in yaml. I would like the ability to either define a subset of categories as a parameter (easy) or - if no categories are given - load the dataset from yaml (since this is a lengthy process due to some verification DatasetProvider is a task in itself, which validates the descriptor and stores a pickled version) and extract the total list of categories from that descriptor.
I.e. in my workflow() routine I have something like:
classMyWorkflow(sl.WorkflowTask):
dataset_path=luigi.Parameter(description="path to the dataset descriptor file")
categories=TupleParameter(default=(), description="tuple with all category labels for which models files should be trained")
defworkflow():
ifnotself.categoriesornotlen(self.categories):
FIXME: loadcategoriesfromdataset_path (usingaDatasetProviderTask) andsetself.categoriesaccordingly....
...
forcinself.categories:
....
model_trainer=self.new_task('model_trainer_'+c,
ModelTrainer,
trainer_params=...
)
Any idea on how to solve this?
May thanks in advance!
The text was updated successfully, but these errors were encountered:
I don't know how I have managed to miss your issue 😕 ...
Did you solve this?
There is an inherent problem in Luigi that scheduling and running the workflow happens separately, and that you can't really access parameter values (as far as I know) during the scheduling phase of the workflow, but only at the running phase.
Thus, you can't easily set up the workflow differently based on parameter values, but have to rely on information that can be read in by normal python code during scheduling (in your workflow() method).
There is functionality for dynamic depencies in Luigi since some time ago, but it specifies dynamically upstream tasks only, and not downstream tasks, which is what I think is most often needed.
This constraint of Luigi's scheduling model is what made us start experimenting with a workflow engine based on the dataflow paradigm instead, where scheduling and execution happens concurrently all the time, which allows to do these kinds of things, SciPipe.
It is a bit crude yet, and not yet used in production, but it has quite some tests and example workflows, and is the tool we are plan to use for our upcoming computational projects in the near future.
Hi,
I am using sciluigi in classification experiments where i would like to have one task per model trainer. The number of model trainer is defined by a list of categories/labels for which models need to be trained which is defined by a dataset descriptor file in yaml. I would like the ability to either define a subset of categories as a parameter (easy) or - if no categories are given - load the dataset from yaml (since this is a lengthy process due to some verification DatasetProvider is a task in itself, which validates the descriptor and stores a pickled version) and extract the total list of categories from that descriptor.
I.e. in my workflow() routine I have something like:
Any idea on how to solve this?
May thanks in advance!
The text was updated successfully, but these errors were encountered: