-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CLI for train.py #1337
Add CLI for train.py #1337
Conversation
This seems like a breaking change, do we have a deprecation plan for existing mcli yamls? I think a lot of people call composer scripts/train/train.py right now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting scripts/train/train.py
is a breaking change.
Ah, thanks for pointing that out. I'll give a more detailed review later |
will update to match scripts/train/train.py merges after first pass |
manual test runs updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This PR allows users to call
composer llm-foundry train {YAML_PATH} {ARGS}
while maintaining correctness withcomposer llm-foundry/train.py {PATH} {ARGS}
. The motivation is for DLE where we want to make the CLI much more intuitive in the docker imagesTesting:
test-cli-cSn2Rb runs:
composer -c -n 8 llmfoundry train /mnt/config/parameters.yaml || (echo "Command failed - killing python" && pkill python && exit 1)
test-cli-qsRHEI runs:
composer -c llmfoundry train /mnt/config/parameters.yaml || (echo "Command failed - killing python" && pkill python && exit 1)
test-cli-vGpXcw runs:
composer train/train.py /mnt/config/parameters.yaml || (echo "Command failed - killing python" && pkill python && exit 1)
Here is the MLflow experiement folder indicating all three runs act the same:
https://dbc-04ac0685-8857.staging.cloud.databricks.com/ml/experiments/3707544126254710?o=3360802220363900&searchFilter=&orderByKey=attributes.start_time&orderByAsc=false&startTime=ALL&lifecycleFilter=Active&modelVersionFilter=All+Runs&datasetsFilter=W10%3D