Implement the Attention is All You Need paper from scratch using PyTorch, focusing on building a sequence-to-sequence transformer architecture for translating text from English to Italian.
The transformer architecture, introduced in the paper Attention is All You Need, revolutionized sequence-to-sequence tasks with its attention-based mechanism, eliminating the need for recurrent networks. This project leverages this architecture to translate English sentences into Italian.
The model was trained and evaluated on the g8a9 / europarl_en-it dataset, the dataset consists of English and Italian sentence pairs used for training and validation.
The intention was to train the model on the entire dataset (the entire dataset consist of 1.9M sentence pairs) but because of the limited hardware capabilities I just used 100K for training, and 20K for evaluating.
Tokenizers from Hugging Face were employed solely for preprocessing the data, other components of the model were implemented from scratch.
Epochs | Loss(train) | Loss(validation) | F1(train) | F1(validation) | Time(minutes) | Device |
---|---|---|---|---|---|---|
5 | 0.116378 | 0.091345 | 0.262342 | 0.273207 | 170 | 1 x T4 GPU |
This project follows the architecture and configurations from the Attention is All You Need paper. The implementation is guided by the principles and methods outlined in the paper.