Skip to content
/ ml-ops-kafka Public template
generated from jmeisele/ml-ops

Kafka variant of the MLOps Level 1 stack

Notifications You must be signed in to change notification settings

jmeisele/ml-ops-kafka

Repository files navigation

MLOps

Cloud agnostic tech stack for starting an MLOps platform (Level 1)

"We'll build a pipeline - after we deploy the model."

Wink

Model drift will hit when it's least convenient for you

To run: Make sure docker is running and you have Docker Compose installed.

  1. Clone the project

    git clone https://github.com/jmeisele/ml-ops-kafka.git
  2. Change directories into the repo

    cd ml-ops
  3. Run database migrations and create the first Airflow user account.

    docker-compose up airflow-init
  4. Build our images and launch with docker compose

    docker-compose pull && docker-compose up
  5. Open a browser and log in to MinIO

    user: minioadmin

    password : minioadmin

    Create a bucket called mlflow

    MinIO

  6. Open a browser and log in to Grafana

    user: admin

    password : admin

    Grafana

    Both Promethus and InfluxDB data sources have already been provisioned along with an MLOps Demo Dashboard and a Notification Channel.

  7. Add the alarm channel to some panels Panels

  8. Start the send_data.py script which sends a POST request every 0.1 seconds

  9. Open a browser and turn on the Airflow DAG used to retrain our ML model

    user: airflow

    password : airflow

Airflow

  1. Lower the alarm threshold to see the Airflow DAG pipeline get triggered

Threshold

  1. Check MLFlow after the Airflow DAG has run to see the model artifacts stored using MinIO as the object storage layer.

  2. (Optional) Send a POST request to our model service API endpoint

    curl -v -H "Content-Type: application/json" -X POST -d
    '{
        "median_income_in_block": 8.3252,
        "median_house_age_in_block": 41,
        "average_rooms": 6,
        "average_bedrooms": 1,
        "population_per_block": 322,
        "average_house_occupancy": 2.55,
        "block_latitude": 37.88,
        "block_longitude": -122.23
    }'
    http://localhost/model/predict
  3. (Optional) If you are so bold, you can also simluate production traffic using locust, but keep in mind you have a lot of services running on your local machine, you would never deploy a production ML API on your local machine to handle production traffic.

Level 1 Workflow & Platform Architecture

MLOps

Model Serving Architecture

API worker architecture

Services

  • nginx: Load Balancer
  • python-model-service1: FastAPI Machine Learning API 1
  • python-model-service2: FastAPI Machine Learning API 2
  • postgresql: RDBMS
  • kafka: Event streaming platform
  • locust: Load testing and simulate production traffic
  • prometheus: Metrics scraping
  • minio: Object storage
  • mlflow: Machine Learning Experiment Management
  • influxdb: Time Series Database
  • chronograf: Admin & WebUI for InxfluxDB
  • grafana: Performance Monitoring
  • redis: Cache
  • airflow: Workflow Orchestrator
  • bridge server: Receives webhook from Grafana and translates to Airflow REST API

gotchas:

Postgres:

Warning: scripts in /docker-entrypoint-initdb.d are only run if you start the container with a data directory that is empty; any pre-existing database will be left untouched on container startup.