Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLI for train.py #1337

Merged
merged 65 commits into from
Jul 10, 2024
Merged

Add CLI for train.py #1337

merged 65 commits into from
Jul 10, 2024

Conversation

KuuCi
Copy link
Contributor

@KuuCi KuuCi commented Jul 3, 2024

This PR allows users to call composer llm-foundry train {YAML_PATH} {ARGS} while maintaining correctness with composer llm-foundry/train.py {PATH} {ARGS}. The motivation is for DLE where we want to make the CLI much more intuitive in the docker images

Testing:
test-cli-cSn2Rb runs:
composer -c -n 8 llmfoundry train /mnt/config/parameters.yaml || (echo "Command failed - killing python" && pkill python && exit 1)

test-cli-qsRHEI runs:
composer -c llmfoundry train /mnt/config/parameters.yaml || (echo "Command failed - killing python" && pkill python && exit 1)

test-cli-vGpXcw runs:
composer train/train.py /mnt/config/parameters.yaml || (echo "Command failed - killing python" && pkill python && exit 1)

Here is the MLflow experiement folder indicating all three runs act the same:
https://dbc-04ac0685-8857.staging.cloud.databricks.com/ml/experiments/3707544126254710?o=3360802220363900&searchFilter=&orderByKey=attributes.start_time&orderByAsc=false&startTime=ALL&lifecycleFilter=Active&modelVersionFilter=All+Runs&datasetsFilter=W10%3D

@KuuCi KuuCi marked this pull request as ready for review July 8, 2024 23:26
@KuuCi KuuCi requested a review from a team as a code owner July 8, 2024 23:26
@b-chu
Copy link
Contributor

b-chu commented Jul 9, 2024

This seems like a breaking change, do we have a deprecation plan for existing mcli yamls? I think a lot of people call composer scripts/train/train.py right now

Copy link
Collaborator

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting scripts/train/train.py is a breaking change.

@KuuCi
Copy link
Contributor Author

KuuCi commented Jul 9, 2024

We aren't deleting scripts/train/train.py, scripts/train/train.py is just calling train/train.py now. Here is a run showing that the existing workflow still works:
test-cli-ZzkqPt runs:
composer train/train.py /mnt/config/parameters.yaml || (echo "Command failed - killing python" && pkill python && exit 1)
image

@b-chu
Copy link
Contributor

b-chu commented Jul 9, 2024

Ah, thanks for pointing that out. I'll give a more detailed review later

@KuuCi
Copy link
Contributor Author

KuuCi commented Jul 9, 2024

will update to match scripts/train/train.py merges after first pass

@KuuCi
Copy link
Contributor Author

KuuCi commented Jul 9, 2024

manual test runs updated

llmfoundry/train/train.py Outdated Show resolved Hide resolved
scripts/train/train.py Outdated Show resolved Hide resolved
scripts/train/train.py Outdated Show resolved Hide resolved
llmfoundry/train/__init__.py Show resolved Hide resolved
@KuuCi KuuCi requested a review from irenedea July 10, 2024 01:26
@mvpatel2000 mvpatel2000 dismissed their stale review July 10, 2024 01:38

Removing old, irene can approve

llmfoundry/cli/cli.py Outdated Show resolved Hide resolved
llmfoundry/cli/cli.py Show resolved Hide resolved
llmfoundry/cli/cli.py Show resolved Hide resolved
tests/a_scripts/train/test_train.py Show resolved Hide resolved
@KuuCi KuuCi requested a review from b-chu July 10, 2024 19:28
Copy link
Contributor

@b-chu b-chu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@KuuCi KuuCi merged commit 129bb56 into main Jul 10, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants