Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observability #15

Merged
merged 21 commits into from
Jul 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
fea1be0
feat(observability): add infrastructure through mlflow system metrics
fmind Jun 28, 2024
c47892e
feat(observability): add alerting with plyer notifications
fmind Jun 29, 2024
d6ca5a3
feat(notification): add service and alerts with plyer
fmind Jul 6, 2024
28b56ef
feat(data): add train, test, and sample data
fmind Jul 6, 2024
a2b8058
fix(data): add parquet data
fmind Jul 6, 2024
9adca52
feat(explanations): add explainability features and tooling
fmind Jul 10, 2024
28321ff
feat(lineage): add lineage features through mlflow data api
fmind Jul 12, 2024
32d77a6
fix(data): fix models explanations name
fmind Jul 12, 2024
82744bb
fix(paths): fix path for explanation job
fmind Jul 13, 2024
05be890
fix(mlflow): remove input examples following the addition of lineage
fmind Jul 13, 2024
25b7b1c
Revert "fix(mlflow): remove input examples following the addition of …
fmind Jul 13, 2024
8a7a8f1
fix(warnings): improve styles and remove warnings
fmind Jul 13, 2024
7ec2d29
fix(loading): use version or alias for loading models
fmind Jul 16, 2024
f72aa93
feat(monitoring): add mlflow.evaluate API
fmind Jul 20, 2024
50ee720
fix(evaluation): add evaluation files
fmind Jul 20, 2024
f4c5aa8
feat(mlproject): add mlflow project and tasks
fmind Jul 20, 2024
f071dd1
fix(projects): change naming convention
fmind Jul 20, 2024
2d182ab
feat(kpi): add key performance indicators
fmind Jul 21, 2024
ee17d0a
fix(kpi): add key performance indicators
fmind Jul 21, 2024
b300bb0
style(style): improve docs, styles, and readme
fmind Jul 21, 2024
5c6b5ee
bump: version 1.0.1 → 1.1.0
fmind Jul 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,29 @@
## v1.1.0 (2024-07-21)

### Feat

- **kpi**: add key performance indicators
- **mlproject**: add mlflow project and tasks
- **monitoring**: add mlflow.evaluate API
- **lineage**: add lineage features through mlflow data api
- **explanations**: add explainability features and tooling
- **data**: add train, test, and sample data
- **notification**: add service and alerts with plyer
- **observability**: add alerting with plyer notifications
- **observability**: add infrastructure through mlflow system metrics

### Fix

- **kpi**: add key performance indicators
- **projects**: change naming convention
- **evaluation**: add evaluation files
- **loading**: use version or alias for loading models
- **warnings**: improve styles and remove warnings
- **mlflow**: remove input examples following the addition of lineage
- **paths**: fix path for explanation job
- **data**: fix models explanations name
- **data**: add parquet data

## v1.0.1 (2024-06-28)

### Fix
Expand Down
9 changes: 9 additions & 0 deletions MLproject
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# https://mlflow.org/docs/latest/projects.html

name: bikes
python_env: python_env.yaml
entry_points:
main:
parameters:
conf_file: path
command: "PYTHONPATH=src python -m bikes {conf_file}"
104 changes: 97 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,13 @@ You can use this package as part of your MLOps toolkit or platform (e.g., Model
- [Programming](#programming)
- [Language: Python](#language-python)
- [Version: Pyenv](#version-pyenv)
- [Observability](#observability)
- [Reproducibility: Mlflow Project](#reproducibility-mlflow-project)
- [Monitoring : Mlflow Evaluate](#monitoring--mlflow-evaluate)
- [Alerting: Plyer](#alerting-plyer)
- [Lineage: Mlflow Dataset](#lineage-mlflow-dataset)
- [Explainability: SHAP](#explainability-shap)
- [Infrastructure: Mlflow System Metrics](#infrastructure-mlflow-system-metrics)
- [Tips](#tips)
- [AI/ML Practices](#aiml-practices)
- [Data Catalog](#data-catalog)
Expand Down Expand Up @@ -150,10 +157,10 @@ job:
KIND: TrainingJob
inputs:
KIND: ParquetReader
path: data/inputs.parquet
path: data/inputs_train.parquet
targets:
KIND: ParquetReader
path: data/targets.parquet
path: data/targets_train.parquet
```

This config file instructs the program to start a `TrainingJob` with 2 parameters:
Expand All @@ -173,6 +180,8 @@ $ poetry run [package] confs/tuning.yaml
$ poetry run [package] confs/training.yaml
$ poetry run [package] confs/promotion.yaml
$ poetry run [package] confs/inference.yaml
$ poetry run [package] confs/evaluations.yaml
$ poetry run [package] confs/explanations.yaml
```

In production, you can build, ship, and run the project as a Python package:
Expand Down Expand Up @@ -210,7 +219,7 @@ You can invoke the actions from the [command-line](https://www.pyinvoke.org/) or

```bash
# execute the project DAG
$ inv dags
$ inv projects
# create a code archive
$ inv packages
# list other actions
Expand All @@ -231,13 +240,16 @@ $ inv --list
- **cleans.coverage** - Clean the coverage tool.
- **cleans.dist** - Clean the dist folder.
- **cleans.docs** - Clean the docs folder.
- **cleans.environment** - Clean the project environment file.
- **cleans.folders** - Run all folders tasks.
- **cleans.mlruns** - Clean the mlruns folder.
- **cleans.mypy** - Clean the mypy tool.
- **cleans.outputs** - Clean the outputs folder.
- **cleans.poetry** - Clean poetry lock file.
- **cleans.pytest** - Clean the pytest tool.
- **cleans.projects** - Run all projects tasks.
- **cleans.python** - Clean python caches and bytecodes.
- **cleans.requirements** - Clean the project requirements file.
- **cleans.reset** - Run all tools, folders, and sources tasks.
- **cleans.ruff** - Clean the ruff tool.
- **cleans.sources** - Run all sources tasks.
Expand All @@ -251,8 +263,6 @@ $ inv --list
- **containers.build** - Build the container image with the given tag.
- **containers.compose** - Start up docker compose.
- **containers.run** - Run the container image with the given tag.
- **dags.all (dags)** - Run all DAG tasks.
- **dags.job** - Run the project for the given job name.
- **docs.all (docs)** - Run all docs tasks.
- **docs.api** - Document the API with pdoc using the given format and output directory.
- **docs.serve** - Serve the API docs with pdoc using the given format and computer port.
Expand All @@ -267,6 +277,10 @@ $ inv --list
- **mlflow.serve** - Start mlflow server with the given host, port, and backend uri.
- **packages.all (packages)** - Run all package tasks.
- **packages.build** - Build a python package with the given format.
- **projects.all (projects)** - Run all project tasks.
- **projects.environment** - Export the project environment file.
- **projects.requirements** - Export the project requirements file.
- **projects.run** - Run an mlflow project from MLproject file.

## Workflows

Expand Down Expand Up @@ -719,6 +733,82 @@ Select your programming environment.
- **Alternatives**:
- Manual installation: time consuming

## Observability

### Reproducibility: [Mlflow Project](https://mlflow.org/docs/latest/projects.html)

- **Motivations**:
- Share common project formats.
- Ensure the project can be reused.
- Avoid randomness in project execution.
- **Limitations**:
- Mlflow Project is best suited for small projects.
- **Alternatives**:
- [DVC](https://dvc.org/): both data and models.
- [Metaflow](https://metaflow.org/): focus on machine learning.
- **[Apache Airflow](https://airflow.apache.org/)**: for large scale projects.

### Monitoring : [Mlflow Evaluate](https://mlflow.org/docs/latest/model-evaluation/index.html)

- **Motivations**:
- Compute the model metrics.
- Validate model with thresholds.
- Perform post-training evaluations.
- **Limitations**:
- Mlflow Evaluate is less feature-rich as alternatives.
- **Alternatives**:
- **[Giskard](https://www.giskard.ai/)**: open-core and super complete.
- **[Evidently](https://www.evidentlyai.com/)**: open-source with more metrics.
- [Arize AI](https://arize.com/): more feature-rich but less flexible.
- [Graphana](https://grafana.com/): you must do everything yourself.

### Alerting: [Plyer](https://github.com/kivy/plyer)

- **Motivations**:
- Simple solution.
- Send notifications on system.
- Cross-system: Mac, Linux, Windows.
- **Limitations**:
- Should not be used for large scale projects.
- **Alternatives**:
- [Slack](https://slack.com/): for chat-oriented solutions.
- [Datadog](https://www.datadoghq.com/): for infrastructure oriented solutions.

### Lineage: [Mlflow Dataset](https://mlflow.org/docs/latest/tracking/data-api.html)

- **Motivations**:
- Store information in Mlflow.
- Track metadata about run datasets.
- Keep URI of the dataset source (e.g., website).
- **Limitations**:
- Not as feature-rich as alternative solutions.
- **Alternatives**:
- [Databricks Lineage](https://docs.databricks.com/en/admin/system-tables/lineage.html): limited to Databricks.
- [OpenLineage and Marquez](https://marquezproject.github.io/): open-source and flexible.

### Explainability: [SHAP](https://shap.readthedocs.io/en/latest/)

- **Motivations**:
- Most popular toolkit.
- Support various models (linear, model, ...).
- Integration with Mlflow through the [SHAP module](https://mlflow.org/docs/latest/python_api/mlflow.shap.html).
- **Limitations**:
- Super slow on large dataset.
- Mlflow SHAP module is not mature enough.
- **Alternatives**:
- [LIME](https://github.com/marcotcr/lime): not maintained anymore.

### Infrastructure: [Mlflow System Metrics](https://mlflow.org/docs/latest/system-metrics/index.html)

- **Motivations**:
- Track infrastructure information (RAM, CPU, ...).
- Integrated with Mlflow tracking.
- Provide hardware insights.
- **Limitations**:
- Not as mature as alternative solutions.
- **Alternatives**:
- [Datadog](https://www.datadoghq.com/): popular and mature solution.

# Tips

This sections gives some tips and tricks to enrich the develop experience.
Expand All @@ -736,10 +826,10 @@ This tag can then be associated to a reader/writer implementation in a configura
```yaml
inputs:
KIND: ParquetReader
path: data/inputs.parquet
path: data/inputs_train.parquet
targets:
KIND: ParquetReader
path: data/targets.parquet
path: data/targets_train.parquet
```

In this package, the implementation are described in `src/[package]/io/datasets.py` and selected by `KIND`.
Expand Down
8 changes: 8 additions & 0 deletions confs/evaluations.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
job:
KIND: EvaluationsJob
inputs:
KIND: ParquetReader
path: data/inputs_train.parquet
targets:
KIND: ParquetReader
path: data/targets_train.parquet
12 changes: 12 additions & 0 deletions confs/explanations.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
job:
KIND: ExplanationsJob
inputs_samples:
KIND: ParquetReader
path: data/inputs_test.parquet
limit: 100
models_explanations:
KIND: ParquetWriter
path: outputs/models_explanations.parquet
samples_explanations:
KIND: ParquetWriter
path: outputs/samples_explanations.parquet
4 changes: 2 additions & 2 deletions confs/inference.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ job:
KIND: InferenceJob
inputs:
KIND: ParquetReader
path: data/inputs.parquet
path: data/inputs_test.parquet
outputs:
KIND: ParquetWriter
path: outputs/predictions.parquet
path: outputs/predictions_test.parquet
4 changes: 2 additions & 2 deletions confs/training.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ job:
KIND: TrainingJob
inputs:
KIND: ParquetReader
path: data/inputs.parquet
path: data/inputs_train.parquet
targets:
KIND: ParquetReader
path: data/targets.parquet
path: data/targets_train.parquet
4 changes: 2 additions & 2 deletions confs/tuning.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ job:
KIND: TuningJob
inputs:
KIND: ParquetReader
path: data/inputs.parquet
path: data/inputs_train.parquet
targets:
KIND: ParquetReader
path: data/targets.parquet
path: data/targets_train.parquet
Binary file added data/inputs_test.parquet
Binary file not shown.
Binary file renamed data/inputs.parquet → data/inputs_train.parquet
Binary file not shown.
Binary file added data/targets_test.parquet
Binary file not shown.
Binary file renamed data/targets.parquet → data/targets_train.parquet
Binary file not shown.
Loading