diff --git a/README.md b/README.md index 4dc264f107..ad5fa379a0 100644 --- a/README.md +++ b/README.md @@ -389,7 +389,7 @@ See [examples](examples) for quick start. It is recommended to duplicate and mod See [these docs](docs/config.qmd) for all config options. - +
Understanding of batch size and gradient accumulation steps
Gradient accumulation means accumulating gradients over several mini-batches and updating the model weights afterward. When the samples in each batch are diverse, this technique doesn't significantly impact learning.