This repository contains the reference implementations of two related papers:
- "Efficient and Approximate Per-Example Gradient Norms for Gradient Noise Scale" (NeurIPS WANT Workshop 2023)
- The code is available in the approx directory.
- "Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers" (NeurIPS 2024) (arXiv)
- The code is available in the exact directory.