TorchSharp NanoGpt #1379
Replies: 6 comments 22 replies
-
Thank you @GeorgeS2019 ! I can almost feel the AGI now! |
Beta Was this translation helpful? Give feedback.
-
This is a nicely implemented version of Karpathy's first setup of reproducing GPT, but does not include the updates from GPT-2 such as some of the CUDA performance improvements, the checkmarks, and some minor tweaks to preactivation. |
Beta Was this translation helpful? Give feedback.
-
Part of the intent with Karpathy making NanoGPT was to be able to mimic GPT-2 in such a way as to be able to load it's model weights. As I have gotten further into developing NanoGPT in TorchSharp I have noticed that the parameter count is slightly off from what it is in PyTorch. I am using the hyper parameters that should be equivalent to GPT-2 small:
(huggingface config for GPT-2: https://huggingface.co/docs/transformers/en/model_doc/gpt2#transformers.GPT2Config) Unfortunately this is yielding 163M (163037184), which is off from GPT-2 small's 124M. As a result, and this is the important part, I cannot load a pretrained model from huggingface. I tried using the same train.bin that Karpathy loads but it is a mismatch. If anyone was curious about training their own models, it looks like the 163M parameter model would take 3.7 days per epoch on an Azure NVidia A100. Of the other implementations of NanoGPT in TorchSharp, has anyone seen any which will load a huggingface model using Model.Load that "just works"? |
Beta Was this translation helpful? Give feedback.
-
Can you share your branch to repro?
|
Beta Was this translation helpful? Give feedback.
-
Personally, I've shifted my attention away from nanogpt towards llama2.c instead: https://github.com/karpathy/llama2.c IMO, the complexity of llama is comparable to nanogpt, yet llama uses a more modern architecture, and you have much better open source pre-trained models available to plug and play. The neat thing about llama2.c is that you can run inference on the cpu, and the inference file is written in a single C file from scratch. There's already a C# port of llama2.c, and I have my own port as well... If you compile with optimizations and AOT, it runs impressively fast, comparable to the original C version, though I don't have a benchmark. |
Beta Was this translation helpful? Give feedback.
-
I have a public version available for NanoGPT here: https://github.com/travisjj/NanoGPT2 Please let me know if anything seems off with it. It trains well, although it has the same pitfalls the original GPT-2 model had in general, and would take quite a lot of compute to train on openwebtext to a low loss rate. |
Beta Was this translation helpful? Give feedback.
-
https://github.com/biegehydra/NanoGptDotnet/blob/master/src/NanoGpt/NanoGpt.cs
Beta Was this translation helpful? Give feedback.
All reactions