From d6325311c35b040f67300c34840d359905143d6a Mon Sep 17 00:00:00 2001
From: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Date: Sun, 4 Feb 2024 11:08:16 -0800
Subject: [PATCH] fill out integrations section

---
 docs/source/integrations.mdx | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/docs/source/integrations.mdx b/docs/source/integrations.mdx
index a2acc2680..7857abf4c 100644
--- a/docs/source/integrations.mdx
+++ b/docs/source/integrations.mdx
@@ -1,23 +1,41 @@
 # Transformers
 
-... TODO: to be filled out ...
+With Transformers it's very easy to load any model in 4 or 8-bit, quantizing them on the fly with bitsandbytes primitives.
+
+Please review the [bitsandbytes section in the Accelerate docs](https://huggingface.co/docs/transformers/v4.37.2/en/quantization#bitsandbytes).
+
+Details about the BitsAndBytesConfig can be found here](https://huggingface.co/docs/transformers/v4.37.2/en/main_classes/quantization#transformers.BitsAndBytesConfig).
+
+## Beware: bf16 is optional compute data type
+If your hardware supports it, `bf16` is the optimal compute dtype. The default is `float32` for backward compatibility and numerical stability. `float16` often leads to numerical instabilities, but `bfloat16` provides the benefits of both worlds: numerical stability and significant computation speedup. Therefore, be sure to check if your hardware supports `bf16` and configure it using the `bnb_4bit_compute_dtype` parameter in BitsAndBytesConfig:
+
+```py
+import torch
+from transformers import BitsAndBytesConfig
+
+quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
+```
 
 # PEFT
+With `PEFT`, you can use QLoRA out of the box with `LoraConfig` and a 4-bit base model.
+
+Please review the [bitsandbytes section in the Accelerate docs](https://huggingface.co/docs/peft/developer_guides/quantization#quantize-a-model).
 
-... TODO: to be filled out ...
+# Accelerate
+
+Bitsandbytes is also easily usable from within Accelerate.
+
+Please review the [bitsandbytes section in the Accelerate docs](https://huggingface.co/docs/accelerate/en/usage_guides/quantization).
 
 # Trainer for the optimizers
 
-... TODO: to be filled out ...
+You can use any of the 8-bit and/or paged optimizers by simple passing them to the `transformers.Trainer` class on intialization.All bnb optimizers are supported by passing the correct string in `TrainingArguments`'s `optim` attribute - e.g. (`paged_adamw_32bit`).
+
+See the [official API docs for reference](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer).
 
 Here we point out to relevant doc sections in transformers / peft / Trainer + very briefly explain how these are integrated:
 e.g. for transformers state that you can load any model in 8-bit / 4-bit precision, for PEFT, you can use QLoRA out of the box with `LoraConfig` + 4-bit base model, for Trainer: all bnb optimizers are supported by passing the correct string in `TrainingArguments`'s `optim` attribute - e.g. (`paged_adamw_32bit`):
 
-Few references:
-
-- [transformers documentation]( https://huggingface.co/docs/transformers/quantization#bitsandbytes)
-- [PEFT documentation](https://huggingface.co/docs/peft/developer_guides/quantization)
-
 # Blog posts
 
 - [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes)