-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MLflow autologging issue # 1618 #2092
base: master
Are you sure you want to change the base?
Conversation
MLflow integration example making pytorch calls and activating tensorboard collection on Darts' s pytortch models to do autologging with MLflow and activate MLflow UI.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2092 +/- ##
==========================================
+ Coverage 93.75% 93.97% +0.21%
==========================================
Files 138 136 -2
Lines 14352 13647 -705
==========================================
- Hits 13456 12825 -631
+ Misses 896 822 -74 ☔ View full report in Codecov by Sentry. |
Hello @dennisbader, As a follow up, is there anything else that I need to do? Cheers! |
Hi @cargecla1. No, all good from your side 🚀 |
Hello @dennisbader, Thanks for letting me know. Merry Xmas! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this changes, there are as passed on.
|
||
with mlflow.start_run(nested=True) as run: | ||
|
||
dataset: PandasDataset = mlflow.data.from_pandas(series, source="AirPassengersDataset") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO should call series.pd_dataframe()
because in my tests it worked only on DataFrames not Series:
dataset: PandasDataset = mlflow.data.from_pandas(series.pd_dataframe(), source="AirPassengersDataset")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added via commit c15eb1e
Thank you!
|
||
# Registering model | ||
model_uri = f"runs:/{run.info.run_id}/darts-NBEATS" | ||
mlflow.register_model(model_uri=model_uri, name=model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO runs currently do not save models as artifacts becaus there is no call to log_model()
. Therefore model_uri
is not pointing to a valid model. Please correct me if I'm wrong, however, it did not work in my tests. If it works, can you pleas add an example how to load with loaded_model = mlflow.<flavor>.load_model(model_uri=model_uri)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to have a look at this, my understanding is that it saves it both using mlflow.<model_flavor>.log_model() or mlflow.register_model(). Refer to https://mlflow.org/docs/latest/model-registry.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As i understand it you can only (kind of) promote a already saved model to a registered model..
Model
An MLflow Model is created from an experiment or run that is logged with one of the model flavor’s mlflow.<model_flavor>.log_model() methods. Once logged, this model can then be registered with the Model Registry.Registered Model
An MLflow Model can be registered with the Model Registry. A registered model has a unique name, contains versions, aliases, tags, and other metadata.
https://mlflow.org/docs/latest/model-registry.html#concepts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As i understand it you can only (kind of) promote a already saved model to a registered model..
Model
An MLflow Model is created from an experiment or run that is logged with one of the model flavor’s mlflow.<model_flavor>.log_model() methods. Once logged, this model can then be registered with the Model Registry.
Registered Model
An MLflow Model can be registered with the Model Registry. A registered model has a unique name, contains versions, aliases, tags, and other metadata.
https://mlflow.org/docs/latest/model-registry.html#concepts
Hello @turbotimon ,
My plans was to share how to do the first part of issue # 1618. As it allows you to track the experiments (so one don't get lost if you are doing more of a few iterations), select the best one, see metrics progress, etc.
I will have a look at what you are suggesting. However, the normal mlflow.python.loadmodel or mlflow.pytorch.loadmodel doesn't seems to work because the way Darts is wrapping torch.nn modules. All test I have done so far didn't work to save and load the model. Only to record hyperparameters, other torchmetrics related information, dataset hash, etc. Thus, I propose to share this, and later we can add saving and loading models (but my last weekend research it is pointing out that the model.py files and common models may need to be rewritten to be compatible with MLFlow (I might be wrong).
Cheers!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cargecla1 Yes, that's a good idea to split these two things
If using the mlflow model registry for darts completely fails, you could also mentioning the workaround I proposed in the issue. Which was manually saving/loading the model as an artifact. Something like:
dartsmodel.save("mymodel.pickle")
mlflow.log_artifact("mymodel.pickle")
# later, load artifact from mlflow and do e.g. a dartsmodel=RNNModel.load("mymodel.pickle")
Let me know if i can help anything!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @turbotimon
After trying to save the Darts model using dartsmodel.save("mymodel.pickle") as suggested, I am getting the following error:
File ~.conda\envs\venv_darts\lib\site-packages\torch\serialization.py:653, in _save(obj, zip_file, pickle_module, pickle_protocol) 651 pickler = pickle_module.Pickler(data_buf, protocol=pickle_protocol) 652 pickler.persistent_id = persistent_id --> 653 pickler.dump(obj) 654 data_value = data_buf.getvalue() 655 zip_file.write_record('data.pkl', data_value, len(data_value))
AttributeError: Can't pickle local object 'LayerSummary._register_hook..hook'
Would you know why this is happening?
Cheers
Hi @cla-ra3426, sorry i missed your question. No idea.. but must have to do somehting with darts itself and not mlflow i suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cargecla1 Yes, that's a good idea to split these two things
If using the mlflow model registry for darts completely fails, you could also mentioning the workaround I proposed in the issue. Which was manually saving/loading the model as an artifact. Something like:dartsmodel.save("mymodel.pickle") mlflow.log_artifact("mymodel.pickle") # later, load artifact from mlflow and do e.g. a dartsmodel=RNNModel.load("mymodel.pickle")Let me know if i can help anything!
Hello @turbotimon , @madtoinou , @dennisbader,
Is it possible to split this issue as discussed above? so we can share how to use, train, track, monitor and save the models using MLFlow? Just leaving loading the model for a later release?
Cheers
Hello gents (@turbotimon , @madtoinou , @dennisbader,)
Have you had time to consider my proposal above? "Is it possible to split this issue as discussed above? so we can share how to use, train, track, monitor and save the models using MLFlow? Just leaving loading the model for a later release?"
The only solution I have right now it is to train the model with MLFlow to track and monitor the model and retrain it with pure Darts to be able to save it and load to predict, Darts saving method "fails" when run inside a MLFlow run, and MLFlow log and save methods don't want to work with Darts either.
Thank you in advice for your consideration!
Cheers!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cargecla1,
Sorry for the delay, loading model is kind-of part of linked issue so it would probably be better to also include it in this PR. But if it too troublesome, we can treat it separately.
The error you're getting when trying to pickle the model is probably due to the (pytorch-lightning) callbacks. Can you try removing them prior to exporting the model?
I will test this example more thoroughly when I have more time, try to see if I can come up with a solution for the loading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @madtoinou
I will try this and see how it goes.
Thanks for this suggestion and for considering splitting the problem into two if above doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @madtoinou ,
I took your suggestion on board as per above, but that didn't fixed the issue with loading and predicting steps.
Refer to commit f798244
Adding first two suggestions raised in MLflow autologging issue # 1618 unit8co#2092 via comments.
Adding code to save pip requirements and conda environment required to run the model via MLFlow.
Adding a way to track and save model before registering it using MLFlow.
Expanding comment to create a new cell to log the artifact and register the model after a model run id has been created.
mlflow.log_param("model_type", "Darts_Pytorch_model") | ||
mlflow.log_param("input_chunk_length", 24) | ||
mlflow.log_param("output_chunk_length", 12) | ||
mlflow.log_param("n_epochs", 500) | ||
mlflow.log_param("model_name", 'NBEATS_MLflow') | ||
mlflow.log_param("log_tensorboard", True) | ||
mlflow.log_param("torch_metrics", "torchmetrics.regression.MeanAbsolutePercentageError()") | ||
mlflow.log_param("nr_epochs_val_period", 1) | ||
mlflow.log_param("pl_trainer_kwargs", "{callbacks: [loss_logger]}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this a bit more concise, with a list of parameters to log and access them in the model.model_params
, model.pl_module_params
or model.trainer_params
attributes (potentially after fusing these dictionaries into one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @madtoinou
I made the code more concise as suggested. Refer to commit f798244
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @madtoinou , @dennisbader ,
I guess you may have been busy with the latest Darts release. Did you have time to look at my latest changes? Does this work to merge the first part of the MLFlow integration as previously discussed?
Cheers!
By the way, nicely done! I need to test the latest release speeds. :)
Updating table of content at the top of the file/page.
Making the log parameters method more concise as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making the correction, I still have some minor comments.
@@ -25,6 +25,7 @@ We assume that you already know about covariates in Darts. If you're new to the | |||
- [Callbacks](#callbacks) | |||
- [Early Stopping](#example-with-early-stopping) | |||
- [Custom Callback](#example-of-custom-callback-to-store-losses) | |||
- [MLFlow: train, track and monitor](#example-with-mlflow-autologging) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the indentation is off, can you shift it to the left once so that it's at the same level as "Callbacks"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @madtoinou ,
I fixed this via commit b839c67
Fixed indentation that was off.
Adding suggestion, thanks! Co-authored-by: madtoinou <[email protected]>
Resolving torchmetrics issue.
Updating the MLFlow saving method including tracking uri.
Hello @madtoinou , I just updated torchmetrics and saving model method to capture your latest suggestions with the **params. Hopefully this is work done now. :) Cheers! |
Hello @madtoinou, @dennisbader , Any updates, comments from my latest updates/commits? Cheers! |
Hello @madtoinou , @dennisbader , Did you have chance to have a look a my latest changes to this PR?, any feedback will be appreciated. Cheers! |
Addresses #1618.
Summary
Draft PR for MLflow integration example making pytorch calls and activating tensorboard collection on Darts' s pytortch models to do autologging with MLflow and activate MLflow UI.