FMS Model Optimizer is a framework for developing reduced precision neural network models. Quantization techniques, such as quantization-aware-training (QAT), post-training quantization (PTQ), and several other optimization techniques on popular deep learning workloads are supported.
- Python API to enable model quantization: With the addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
- Robust: Verified for INT 8/4-bit quantization on important vision/speech/NLP/object detection/LLMs.
- Flexible: Options to analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. during quantization.
- State-of-the-art INT and FP quantization techniques for weights and activations, such as SmoothQuant, SAWB+ and PACT+.
- Supports key compute-intensive operations like Conv2d, Linear, LSTM, MM and BMM
GPTQ | FP8 | PTQ | QAT | |
---|---|---|---|---|
Granite | ✅ | ✅ | ✅ | 🔲 |
Llama | ✅ | ✅ | ✅ | 🔲 |
Mixtral | ✅ | ✅ | ✅ | 🔲 |
BERT/Roberta | ✅ | ✅ | ✅ | ✅ |
Note: Direct QAT on LLMs is not recommended
- 🐧 Linux system with Nvidia GPU (V100/A100/H100)
- Python 3.10 or Python 3.11 📋 Python 3.12 is currently not supported due to PyTorch Dynamo constraint
- CUDA >=12
Optional packages based on optimization functionalities required:
- GPTQ is a popular compression method for LLMs:
- If you want to experiment with INT8 deployment in QAT and PTQ examples:
- FP8 is a reduced precision format like INT8:
- Nvidia H100 family or higher
- llm-compressor
- To enable compute graph plotting function (mostly for troubleshooting purpose):
Note
PyTorch version should be < 2.4 if you would like to experiment deployment with external INT8 kernel.
We recommend using a Python virtual environment with Python 3.10+. Here is how to setup a virtual environment using Python venv:
python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
Tip
If you use pyenv, Conda Miniforge or other such tools for Python version management, create the virtual environment with that tool instead of venv. Otherwise, you may have issues with installed packages not being found as they are linked to your Python version management tool and not venv
.
To install fms_mo
package from source:
python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
git clone https://github.com/foundation-model-stack/fms-model-optimizer
cd fms-model-optimizer
pip install -e .
To help you get up and running as quickly as possible with the FMS Model Optimizer framework, check out the following resources which demonstrate how to use the framework with different quantization techniques:
- Jupyter notebook tutorials (It is recommended to begin here):
- Quantization tutorial:
- Visualizes a random Gaussian tensor step-by-step along the quantization process
- Build a quantizer and quantized convolution module based on this process
- Quantization tutorial:
- Python script examples
Dive into the design document to get a better understanding of the framework motivation and concepts.
Check out our contributing guide to learn how to contribute.