Website: Prefect Surrogate Models
Demonstrating the use of Prefect to orchestrate the creation of machine learning surrogate models as applied to mechanistic crop models.
The purpose of this project is to provide a simple demonstration of how to construct a Prefect pipeline, with MLOps integration, to orchestrate the creation of machine learning surrogate models as applied to mechanistic crop models.
We use this machine learning model (a Gaussian process) to support various downstream model calibration tasks. In the example here, we perform global optimisation. Note that the demonstrated approach is different from Bayesian optimisation.
For building Gaussian processes, we use GPyTorch.
For performing optimisation, we use Optuna.
Prefect has been included to orchestrate the surrogate modelling pipeline.
The pipeline is composed of the following steps:
- Make use of Latin Hypercube sampling to draw from the parameter space and construct a design matrix.
- Run the WOFOST model n times for each sampled parameter set.
- Train a variational Gaussian process to map the parameter sets against the WOFOST simulation outputs.
- Perform parameter optimisation using the Tree-Structured Parzen Estimator algorithm. Rather than directly executing WOFOST during the optimisation procedure, we instead perform optimisation on the Gaussian process.
Run prefect.sh to run the full pipeline.
The results of the pipeline can be accessed from the output directory.
Python dependencies are specified in this requirements.txt file.
These dependencies are installed during the build process for the following Docker image: ghcr.io/jbris/prefect-surrogate-models:1.0.0
Execute the following command to pull the image: docker pull ghcr.io/jbris/prefect-surrogate-models:1.0.0
- A Docker compose file has been provided to launch an MLOps stack.
- See the .env file for Docker environment variables.
- The docker_up.sh script can be executed to launch the Docker services.
- DVC is included for data version control.
- MLFlow is available for experiment tracking.
- MinIO is available for storing experiment artifacts.