Skip to content
/ Manta Public

πŸ’« A lightweight distributed cache system for model distributions on Kubernetes. 🌟 Star to support our work!

License

Notifications You must be signed in to change notification settings

InftyAI/Manta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

56 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

manta

A lightweight P2P-based cache system for model distributions on Kubernetes.

stability-alpha GoReport Widget Latest Release

Name Story: the inspiration of the name Manta is coming from Dota2, called Manta Style, which will create 2 images of your hero just like peers in the P2P network.

Architecture

architecture

Note: llmaz is just one kind of integrations, Manta can be deployed and used independently.

Features Overview

  • Model Hub Support: Models could be downloaded directly from model hubs (Huggingface etc.) or object storages, no other effort.
  • Model Preheat: Models could be preloaded to clusters, or specified nodes to accelerate the model serving.
  • Model Cache: Models will be cached as chunks after downloading for faster model loading.
  • Model Lifecycle Management: Model lifecycle is managed automatically with different strategies, like Retain or Delete.
  • Plugin Framework: Filter and Score plugins could be extended to pick up the best candidates.
  • Memory Management(WIP): Manage the reserved memories for caching, together with LRU algorithm for GC.

You Should Know Before

  • Manta is not an all-in-one solution for model management, instead, it offers a lightweight solution to utilize the idle bandwidth and cost-effective disk, helping you save money.
  • It requires no additional components like databases or storage systems, simplifying setup and reducing effort.
  • All the models will be stored under the host path of /mnt/models/
  • After all, it's just a cache system.

Quick Start

Installation

Read the Installation for guidance.

Preheat Models

A sample to preload the Qwen/Qwen2.5-0.5B-Instruct model:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct

If you want to preload the model to specified nodes, use the NodeSelector:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct
  nodeSelector:
    zone: zone-a

Delete Models

If you want to remove the model weights once Torrent is deleted, set the ReclaimPolicy=Delete, default to Retain:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct
  reclaimPolicy: Delete

More details refer to the APIs.

Roadmap

In the long term, we hope to make Manta an unified cache system within MLOps.

  • Preloading datasets from model hubs
  • RDMA support for faster model loading
  • More integrations with MLOps system, including training and serving

Community

Join us for more discussions:

Contributions

All kinds of contributions are welcomed ! Please following CONTRIBUTING.md.