🤖
A variety of resources that really helped us out in understanding and implementing the Transformer model
- Attention Is All You Need
- Coding a Transformer from scratch on PyTorch, with full explanation, training and inference by Umar Jamil
- Visualizing Attention, a Transformer's Heart by 3Blue1Brown
- Understanding and Coding the Self-Attention Mechanism of Large Language Models From Scratch by Sebastian Raschka
- Self Attention in Transformer Neural Networks (with Code!) by CodeEmporium
- Visualizing attention matrix using BertViz