From d6325311c35b040f67300c34840d359905143d6a Mon Sep 17 00:00:00 2001 From: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com> Date: Sun, 4 Feb 2024 11:08:16 -0800 Subject: [PATCH] fill out integrations section --- docs/source/integrations.mdx | 34 ++++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/docs/source/integrations.mdx b/docs/source/integrations.mdx index a2acc2680..7857abf4c 100644 --- a/docs/source/integrations.mdx +++ b/docs/source/integrations.mdx @@ -1,23 +1,41 @@ # Transformers -... TODO: to be filled out ... +With Transformers it's very easy to load any model in 4 or 8-bit, quantizing them on the fly with bitsandbytes primitives. + +Please review the [bitsandbytes section in the Accelerate docs](https://huggingface.co/docs/transformers/v4.37.2/en/quantization#bitsandbytes). + +Details about the BitsAndBytesConfig can be found here](https://huggingface.co/docs/transformers/v4.37.2/en/main_classes/quantization#transformers.BitsAndBytesConfig). + +## Beware: bf16 is optional compute data type +If your hardware supports it, `bf16` is the optimal compute dtype. The default is `float32` for backward compatibility and numerical stability. `float16` often leads to numerical instabilities, but `bfloat16` provides the benefits of both worlds: numerical stability and significant computation speedup. Therefore, be sure to check if your hardware supports `bf16` and configure it using the `bnb_4bit_compute_dtype` parameter in BitsAndBytesConfig: + +```py +import torch +from transformers import BitsAndBytesConfig + +quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16) +``` # PEFT +With `PEFT`, you can use QLoRA out of the box with `LoraConfig` and a 4-bit base model. + +Please review the [bitsandbytes section in the Accelerate docs](https://huggingface.co/docs/peft/developer_guides/quantization#quantize-a-model). -... TODO: to be filled out ... +# Accelerate + +Bitsandbytes is also easily usable from within Accelerate. + +Please review the [bitsandbytes section in the Accelerate docs](https://huggingface.co/docs/accelerate/en/usage_guides/quantization). # Trainer for the optimizers -... TODO: to be filled out ... +You can use any of the 8-bit and/or paged optimizers by simple passing them to the `transformers.Trainer` class on intialization.All bnb optimizers are supported by passing the correct string in `TrainingArguments`'s `optim` attribute - e.g. (`paged_adamw_32bit`). + +See the [official API docs for reference](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer). Here we point out to relevant doc sections in transformers / peft / Trainer + very briefly explain how these are integrated: e.g. for transformers state that you can load any model in 8-bit / 4-bit precision, for PEFT, you can use QLoRA out of the box with `LoraConfig` + 4-bit base model, for Trainer: all bnb optimizers are supported by passing the correct string in `TrainingArguments`'s `optim` attribute - e.g. (`paged_adamw_32bit`): -Few references: - -- [transformers documentation]( https://huggingface.co/docs/transformers/quantization#bitsandbytes) -- [PEFT documentation](https://huggingface.co/docs/peft/developer_guides/quantization) - # Blog posts - [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes)