-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using sweep in the train pipeline errors out #115
Comments
The error message you've received, indicates that there is a problem with how you've defined the output binding for the model_output in your train_model job. In the error, it's looking for the output binding in $parent.jobs.train_model.outputs.model_output which seems incorrect because a job cannot reference its own outputs in this manner. Outputs generated by a job are usually used as inputs in subsequent jobs. Therefore, I storngly feel that this error is due to a circular reference between your steps.To resolve this, you will need to ensure that the model_output is correctly defined in the outputs section of the train_model job and that it is correctly referenced in the jobs that use it as an input. From the snippet you've posted, the model_output seems to be correctly defined in the outputs section of the train_model job: outputs:
model_output: ${{parent.outputs.trained_model}} Now, you need to ensure that in the subsequent jobs where model_output is being used as an input, it is correctly referenced. For instance, if it is used in a job called evaluate_job, it should be referenced as: inputs:
model_input: ${{parent.jobs.train_model.outputs.model_output}} Check where you are using On a different note, I noticed that several hyperparameters such as command: >-
python train.py
--train_data ${{inputs.train_data}}
--model_output ${{outputs.model_output}}
--regressor__n_estimators ${{search_space.regressor__n_estimators}}
--regressor__bootstrap ${{search_space.regressor__bootstrap}}
--regressor__max_depth ${{search_space.regressor__max_depth}} And define these hyperparameters in your search_space:
regressor__n_estimators:
type: choice
values: [100, 200]
regressor__bootstrap:
type: choice
values: [true, false]
regressor__max_depth:
type: choice
values: [10, 20, 30, None]
regressor__max_features:
type: choice
values: ["auto", "sqrt", "log2", None]
regressor__min_samples_leaf:
type: choice
values: [1, 2, 4]
regressor__min_samples_split:
type: choice
values: [2, 5, 10] I also recommend to use import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--regressor__n_estimators', type=int, default=100)
parser.add_argument('--regressor__bootstrap', type=bool, default=True)
parser.add_argument('--regressor__max_depth', type=int, default=None)
parser.add_argument('--regressor__max_features', type=str, default='auto')
parser.add_argument('--regressor__min_samples_leaf', type=int, default=1)
parser.add_argument('--regressor__min_samples_split', type=int, default=2)
# Add other arguments here...
args = parser.parse_args() Hope this helps |
Describe the bug or the issue that you are facing
I'm trying to implement hyperparameter tuning in the default train pipeline by setting up a sweep job. It errors out during the run-model-training-pipeline / run-pipeline after running the workflow deploy-model-training-pipeline
Steps/Code to Reproduce
Run run_id=$(az ml job create --file /home/runner/work/Azure_mlops_v2_demo/Azure_mlops_v2_demo/mlops/azureml/train/pipeline.yml --resource-group rg-mlopsv2-0040dev --workspace-name mlw-mlopsv2-0040dev --query name -o tsv)
Class WorkspaceHubOperations: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
ERROR: Failed to find referenced source for input binding $parent.jobs.train_model.outputs.model_output
Error: Process completed with exit code 1.
Expected Output
Execute .github/workflows/deploy-model-training-pipeline-classical.yml workflow with no errors
Versions
I'm using GitHub Actions and created my own repository following your guide and created a new dev branch.
Terraform
Azure ML CLI v2
Pre built examples from Tabular
Classic
Which platform are you using for deploying your infrastrucutre?
GitHub Actions (GitHub)
If you mentioned Others, please mention which platformm are you using?
No response
What are you using for deploying your infrastrucutre?
Terraform
Are you using Azure ML CLI v2 or Azure ML Python SDK v2
Azure ML CLI v2
Describe the example that you are trying to run?
Pre built examples from Tabular
The text was updated successfully, but these errors were encountered: