single_file_seq2seq

A character-level seq2seq transformer from scratch in a single file seq2seq.py.

Optimized for readability and learnability.

features

single file
as readable as possible
comments for learnings and common errors
working code that:
- trains on paired sequences of text
- given input text, generates the corresponding output text

demo

We train a character-level seq2seq transformer to translate from Hinglish (a modern hybrid of Hindi and English) to English.

After training, the same model is used to translate sample Hinglish sentences to English.

The dataset used is cmu-hinglish-dog on Huggingface, which provides samples of movie reviews written in Hinglish that have been translated to English.

dependencies

python >= 3.10
torch >= 2.0
datasets

install

pip install torch
pip install datasets

run

python seq2seq.py

contributing

All contributions in the form of confusions, concerns, suggestions, or improvements are welcome!

acknowledgements

This repo was motivated by my previous "single file" repo single_file_gpt, which in turn was influenced by Andrej Karpathy's nanogpt.

The demo in this repo uses the cmu-hinglish-dog dataset on Huggingface, orignally produced by Zhou et al., 2018. This dataset can also be found in the datasets-CMU_DoG repo on Github.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
seq2seq.py		seq2seq.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

single_file_seq2seq

features

demo

dependencies

install

run

contributing

acknowledgements

license

About

Releases

Packages

Languages

License

veezbo/single_file_seq2seq

Folders and files

Latest commit

History

Repository files navigation

single_file_seq2seq

features

demo

dependencies

install

run

contributing

acknowledgements

license

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages