Coding the transformer model architecture in one page for tutorial. Try to implement the model by PyTorch simply and clearly.
You'd better read the transformer paper before you coding. The original paper: Attention Is All You Need
Step1: Implement sub modules:
- Embedding
- Positional Encoding
- Multi Head Attention (Key)
- Position wise Feed Forward
- Encoder Layer
- Decoder Layer
- Generator
Step2: Build Nx Layers
- Encoder
- Decoder
Step3: Assemble
- Transformer
Requirements:
- torch>=1.13
There are three tests at the end of the code.
if __name__ == "__main__":
test_multi_head_attention()
test_positional_encoding()
test_transformer()
Run as follows
python transfomer.py
Result: