Skip to content

Commit

Permalink
Update overview.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Qubitium authored Dec 24, 2024
1 parent 049ed54 commit 403a27f
Showing 1 changed file with 14 additions and 9 deletions.
23 changes: 14 additions & 9 deletions docs/source/en/quantization/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,45 +45,50 @@ In short, supporting a wide range of quantization methods allows you to pick the

Use the table below to help you decide which quantization method to use.

| Quantization method | On the fly quantization | CPU | CUDA GPU | RoCm GPU (AMD) | Metal (Apple Silicon) | Intel GPU | torch.compile() support | Number of bits | Supports fine-tuning (through PEFT) | Serializable with 🤗 transformers | 🤗 transformers support | Link to library |
| Quantization method | On the fly quantization | CPU | CUDA GPU | ROCm GPU (AMD) | Metal (Apple Silicon) | Intel GPU | torch.compile() | Number of bits | Supports fine-tuning (through PEFT) | Serializable with 🤗 transformers | 🤗 transformers support | Link to library |
|--------------------------------------------|-------------------------|-----------------|----------|-----------------|------------------------------------|-----------------|-------------------------|----------------|-------------------------------------|--------------|------------------------|---------------------------------------------|
| [AQLM](./aqlm.md) | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 🔴 | 🟢 | 1 / 2 | 🟢 | 🟢 | 🟢 | https://github.com/Vahe1994/AQLM |
| [AWQ](./awq.md) | 🔴 | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | ? | 4 | 🟢 | 🟢 | 🟢 | https://github.com/casper-hansen/AutoAWQ |
| [bitsandbytes](./bitsandbytes.md) | 🟢 | 🟡 <sup>1</sup> | 🟢 | 🟡 <sup>1</sup> | 🔴 <sup>2</sup> | 🟡 <sup>1</sup> | 🔴 (soon!) | 4 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/bitsandbytes-foundation/bitsandbytes |
| [bitsandbytes](./bitsandbytes.md) | 🟢 | 🟡 <sup>1</sup> | 🟢 | 🟡 <sup>1</sup> | 🔴 <sup>2</sup> | 🟡 <sup>1</sup> | 🔴 <sup>1</sup> | 4 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/bitsandbytes-foundation/bitsandbytes |
| [compressed-tensors](./compressed_tensors.md) | 🔴 | 🟢 | 🟢 | 🟢 | 🔴 | 🔴 | 🔴 | 1 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/neuralmagic/compressed-tensors |
| [EETQ](./eetq.md) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | ? | 8 | 🟢 | 🟢 | 🟢 | https://github.com/NetEase-FuXi/EETQ |
| GGUF / GGML (llama.cpp) | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 1 / 8 | 🔴 | [See GGUF section](../gguf) | [See GGUF section](../gguf) | https://github.com/ggerganov/llama.cpp |
| [GPTQModel](./gptq.md) | 🔴 | 🟢 <sup>3</sup> | 🟢 | 🟢 | 🔴 | 🟢 <sup>3</sup> | 🔴 | 2 / 3 / 4 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/ModelCloud/GPTQModel |
| [GPTQModel](./gptq.md) | 🔴 | 🟢 <sup>3</sup> | 🟢 | 🟢 | 🟢 | 🟢 <sup>4</sup> | 🔴 | 2 / 3 / 4 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/ModelCloud/GPTQModel |
| [AutoGPTQ](./gptq.md) | 🔴 | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 🔴 | 2 / 3 / 4 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/AutoGPTQ/AutoGPTQ |
| [HIGGS](./higgs.md) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🟢 | 2 / 4 | 🔴 | 🟢 | 🟢 | https://github.com/HanGuo97/flute |
| [HQQ](./hqq.md) | 🟢 | 🟢 | 🟢 | 🔴 | 🔴 | 🔴 | 🟢 | 1 / 8 | 🟢 | 🔴 | 🟢 | https://github.com/mobiusml/hqq/ |
| [optimum-quanto](./quanto.md) | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🔴 | 🟢 | 2 / 4 / 8 | 🔴 | 🔴 | 🟢 | https://github.com/huggingface/optimum-quanto |
| [FBGEMM_FP8](./fbgemm_fp8.md) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 8 | 🔴 | 🟢 | 🟢 | https://github.com/pytorch/FBGEMM |
| [torchao](./torchao.md) | 🟢 | | 🟢 | 🔴 | 🟡 <sup>4</sup> | 🔴 | | 4 / 8 | | 🟢🔴 | 🟢 | https://github.com/pytorch/ao |
| [torchao](./torchao.md) | 🟢 | | 🟢 | 🔴 | 🟡 <sup>5</sup> | 🔴 | | 4 / 8 | | 🟢🔴 | 🟢 | https://github.com/pytorch/ao |
| [VPTQ](./vptq.md) | 🔴 | 🔴 | 🟢 | 🟡 | 🔴 | 🔴 | 🟢 | 1 / 8 | 🔴 | 🟢 | 🟢 | https://github.com/microsoft/VPTQ |

<Tip>

**<sup>1</sup>** bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend). Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links.

**<sup>1</sup>** bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend).
</Tip>

<Tip>

We value your feedback to help identify bugs before the full release! Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links.
**<sup>2</sup>** bitsandbytes is seeking contributors to help develop and lead the Apple Silicon backend. Interested? Contact them directly via their repo. Stipends may be available through sponsorships.

</Tip>

<Tip>

**<sup>2</sup>** bitsandbytes is seeking contributors to help develop and lead the Apple Silicon backend. Interested? Contact them directly via their repo. Stipends may be available through sponsorships.
**<sup>3</sup>** GPTQModel[CPU] supports full bit range via Torch and 4-bit via IPEX.

</Tip>

<Tip>

**<sup>3</sup>** GPTQModel only supports 4-bit on Intel CPU / GPU.
**<sup>4</sup>** GPTQModel[Intel GPU] via IPEX only supports 4-bit for Intel Datacenter Max + Arc.

</Tip>

<Tip>

**<sup>4</sup>** torchao only supports int4 weight on Metal (Apple Silicon).
**<sup>5</sup>** torchao only supports int4 weight on Metal (Apple Silicon).

</Tip>

0 comments on commit 403a27f

Please sign in to comment.