Code for my Medium blog post: Transformers from Scratch in PyTorch
Note: This Transformer code does not include masked attention. That was intentional, because it led to a much cleaner implementation. This repository is intended for educational purposes only. I believe that everything here is correct, but make no guarantees if for some reason you decide to use it in your own project.