Skip to content

Latest commit

 

History

History
31 lines (20 loc) · 1.08 KB

README.md

File metadata and controls

31 lines (20 loc) · 1.08 KB

Transformer implementation

My implementation of the transformer architecture from the paper Attention is all you need.

Why an another implementation?

I made this code to learn the basis of pytorch and practice my skill in deep learning. There is a lot of chance that the implementation is wrong so I do not recommend using it, it's just a student project.

What I use to build the project

I used the "Attention is all you need" paper but also a lot of external resources like:

You can take a look on the resource.

Training

You just have to follow these two steps

  1. Install modules: pip install -r requirements.txt

  2. Run the training script: python3 train.py

TODO

  • Implement learning rate scheduler
  • Add script to run prediction
  • Use sentencepiece for tokenization