Differing behavior for handling run tags between copy_run and import_run in Databricks #29

smurching · 2021-12-09T15:42:53Z

In Databricks, I noticed there's a behavior difference when calling the RunExporter.export_run + RunImporter.import_run APIs vs just calling the RunCopier.copy_run APIs, to push runs between workspaces. The former approach seems to skip importing internal MLflow tags (e.g. original job ID/job run ID) but the latter just directly copies all tags over (see logic), which results in broken links to e.g. Databricks notebooks or jobs in the run UI. Filing this issue to facilitate discussion

The text was updated successfully, but these errors were encountered:

smurching · 2021-12-09T16:44:36Z

One alternative could be to copy over tags known to be absolute/workspace agnostic, e.g. tags like mlflow.source.git.commit should just be copied over since they should be the same regardless of what workspace the run is in.

amesar · 2021-12-13T05:44:38Z

I've added to the README files that the copy mode is deprecated.
Since the MLflowClient constructor doesn't allow you to pass in credentials, you can't have two (or more) clients active in the same program. This is needed to make the copy mode reliable and maintable. Currently the credentials are obtained by singleton global magic variables (environment variables). Sort of violates dependency injection principles. ;)
This results in squirrelly code to overcome this limitation.
It makes the copy mode very difficult to code, test and maintain.
Besides, under the hood copy mode doesn't provide much advantage since the artifacts will still have to be exported into an intermediate location just like in the export/import mode.
Copy mode is basically syntactic sugar over the underlying code export/import mode.
So I'd recommend sticking with the export/import mode which has been "battle tested".
Do you have a specific use case to prefer copy mode over export/import mode?

smurching · 2021-12-14T14:40:18Z

Since the MLflowClient constructor doesn't allow you to pass in credentials, you can't have two (or more) clients active in the same program. This is needed to make the copy mode reliable and maintable. Currently the credentials are obtained by singleton global magic variables (environment variables). Sort of violates dependency injection principles. ;)

This is something we can make clearer in the MLflow docs :) but it actually is possible to have multiple clients active in the same program - basically what I was doing was supplying the src_client and dst_client parameters of RunCopier and relying on Databricks secrets to configure them to talk to different workspace as described here

Do you have a specific use case to prefer copy mode over export/import mode?

Good question, mostly just to avoid writing code to export to a tmpfile and reimport it into my new workspace, i.e. the code with copy is more declarative + closer to my underlying intention. But it's definitely not a must-have :). The bigger issue I ran into was actually #27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differing behavior for handling run tags between copy_run and import_run in Databricks #29

Differing behavior for handling run tags between copy_run and import_run in Databricks #29

smurching commented Dec 9, 2021 •

edited

Loading

smurching commented Dec 9, 2021

amesar commented Dec 13, 2021

smurching commented Dec 14, 2021 •

edited

Loading

Differing behavior for handling run tags between copy_run and import_run in Databricks #29

Differing behavior for handling run tags between copy_run and import_run in Databricks #29

Comments

smurching commented Dec 9, 2021 • edited Loading

smurching commented Dec 9, 2021

amesar commented Dec 13, 2021

smurching commented Dec 14, 2021 • edited Loading

smurching commented Dec 9, 2021 •

edited

Loading

smurching commented Dec 14, 2021 •

edited

Loading