This project us inspired by Andrej Karpathy who implemented reproduction of OpenAI GPT-2 Architecture (https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).
Aim:
The aim of this project is 1st to learn the basics of LLMs, how & why they work. Obviously, I'll put in my ideas & ML concepts within the architecture to improve the results.
Resources:
Theory
(1) The Attention Mechanism in Large Language Models (https://www.youtube.com/watch?v=OxCpWwDCDFQ)
(2) The math behind Attention: Keys, Queries, and Values matrices (https://www.youtube.com/watch?v=UPtG_38Oq8o)
(3) What are Transformer Models and how do they work? (https://www.youtube.com/watch?v=qaWMOYf4ri8)
(4) Course Series by Andrej Karpathy (https://karpathy.ai/zero-to-hero.html)