-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MLflow log_model
option
#1544
Merged
Merged
+282
−121
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…cleans up the code a little and prevents us from having forked logic in Composer to fetch by run_id
…cleans up the code a little and prevents us from having forked logic in Composer to fetch by run_id
dakinggg
reviewed
Sep 24, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What testing have you done? We need to make sure everything e2e shows up properly
dakinggg
reviewed
Sep 24, 2024
irenedea
reviewed
Oct 2, 2024
irenedea
reviewed
Oct 2, 2024
nancyhung
changed the title
Add MLflow
[WIP] Add MLflow Oct 4, 2024
log_model
optionlog_model
option
dakinggg
reviewed
Oct 4, 2024
dakinggg
reviewed
Nov 1, 2024
dakinggg
reviewed
Nov 1, 2024
dakinggg
reviewed
Nov 1, 2024
dakinggg
reviewed
Nov 1, 2024
nancyhung
commented
Nov 1, 2024
nancyhung
commented
Nov 1, 2024
dakinggg
approved these changes
Nov 1, 2024
dakinggg
added a commit
that referenced
this pull request
Nov 1, 2024
Co-authored-by: Daniel King <[email protected]>
dakinggg
added a commit
that referenced
this pull request
Nov 1, 2024
Co-authored-by: Daniel King <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
In order to support customers with sensitive storage network configurations, we have to use the
log_model
API. This will cause duplicate artifact uploads, which is not efficient, so we will only reserve rolling out to customers who require this.This PR contains the first of 2 changes:
log_model
instead of uploading to MLflow artifacts.save_model
,register_model
, and uploading to UC directly via the remote uploader downloader object, this change simplifies the control logic with themlflow.log_model
function. This function is also critical to support secure training requirements, such as customer firewalls or private endpoints. Logging a model to MLflow will call the necessary steps to save and register a model for deployment.log_model
but not register the model. That way, a user can still manually register their intermediate checkpoints for evaluation.Testing
When incorporating this in MAPI, we should enable
final_register_only
to only upload using thelog_model
logic instead of uploading a duplicate copy to MLflow artifacts. All tests were done in AWS staging.Works for older models
[Databricks staging] Llama3 8b
Run:
llama3-log-model-xusOti
Llama3 8b was able to be successfully deployed here: https://e2-dogfood.staging.cloud.databricks.com/ml/endpoints/test-log-model?o=6051921418418893.
Works for newest models with extra security
[MCT] Llama3.2 1b
Run:
llama3-log-model-4eJUKo
Experiment: https://dbc-559ffd80-2bfc.cloud.databricks.com/ml/experiments/2854093459220376?viewStateShareKey=55a332dc80d7200b6a6301d8f0163155ce9aac54d21436c9d292f0745e0bff05
Endpoint: https://dbc-559ffd80-2bfc.cloud.databricks.com/ml/endpoints/testfinetuning?o=7395834863327820
[MCT] Llama3.1 405b
Run:
405b-register-1-xB3dOx
Tested that mlflow.log_model registers model in private link workspace
Registered model: https://adb-1622130341351604.4.azuredatabricks.net/explore/data/models/rkg-ft/default/llamatest?o=1622130341351604
Model stuck in pending example:
Log model also worked