Skip to content

Experiment Management using MLflow

tangy5 edited this page Jan 10, 2023 · 5 revisions

Introduction

MLflow is an open source platform for managing machine learning workflows. It can track experiments to record and monitor parameters, metrics, or results. MLflow manages model deployment from a variety of machine learning packages of serving and inference platforms. In particular, MLflow Tracking provides a collaborative management cycle of machine learning experiments, including parameters, metrics, and other customized graphs, tags and artifacts.

Lastest MONAI Label provides integration of MLflow tracking, users and developers can create model training and fine-tuning experiments with the MLflow tracking feature. MLflow is used as an API and UI component for logging parameters, code versions, metrics, and output files when conducting deep learning training and visualizing results. This end-to-end cyclic platform can improve the experiment experience, and help developers improve machine learning models.

In MONAI Label, MLflow Tracking is organized around the concept of training, which are executions of "run" when training loops are triggered during annotation. In short, MONAI Label + MLflow provides the following records:

  • Time: Start, end and duration time of the run
  • Source: Tracking name of the file to launch the run or the project name.
  • Parameters: Key-value input parameters of your choice. Both keys and values are strings.
  • Metrics: Key-value metrics. Each metric can be updated throughout the course of the run, and MLflow records visualize the metric’s history.
  • Artifacts: Output files in any format. Users can record images as artifacts (if any).

drawing

How to Use

1. Set MLflow Tracking Options before Training

Use the sample Radiology app and 3D Slicer as a demonstration, MONAI Label plugin provides UI interfaces to set train parameters.

Suppose one annotation loop is finished, and ready to train those new labels, the user can find the Options panel in MONAI Label plugin.

drawing

  • Select train in the Section dropdown scroll, then select the model for training (e.g., segmentation_spleen).
  • In the Options, users can choose mlflow for experiment management, or choose None to disable it.
  • Users can set the MLflow recording file path for the experiment run. The path needs to be in URI format, such as file://<path to mlflow file>

Note: if no URI is provided, MONAI Label server will automatically set the MLflow tracking URI, use the Radiology app as an example, and the MLflow tracking files will be saved to:

<path to radiology app>/radiology/model/segmentation_spleen/train_01/mlruns.

Users can find the tracking URI in the MONAI Label server logs.

  • tracking_experiment_name: Users can set the tracking run name, MONAI Label will automatically set the default name by the model and number of iterations run if left blank.

2. Start MLflow Server with the Recorded Tracking File URI

Once the training is on, MLflow will generate recording files on the filesystem mlruns, users can start the MLflow server and visualize/manage all training parameters and metrics.

mlflow server --backend-store-uri <MLflow Tracking URI>

  • Basic parameters and metrics with MONAI Label trainings.

drawing

3. Interactive Training and Experiments

MONAI Label offers active learning and interactive labeling experience, users can annotate datasets and optimize models by iterations. MONAI Label + MLflow offers to track multiple experiment runs, users can find different experiment runs in the MLflow server and platform.

Experiment management using MLflow with three iterations training:

drawing