quick-trainer

Inspired by Andrej Karpathy, this repo serves as a flexible, configurable way to kick off LLM training experiments. The goal of the repo is to be able to take a new model card, update a YAML file for the particular components of that model (e.g. FFN dim, activation function, RoPE, order etc) and then be able to train a real model from scratch with one command. It should work for single CPU, GPU, or multi GPU clusters out of the box, and this should largely be abstracted from the user outside of the distributed strategy to use.

The out of the box configs are in configs directory -> you define one for the model (e.g. fill it out with a typical model card), and one for training (e.g. optimizer, batch size, dtype etc).

Currently the repo supports the following:

✔️ Multihead Attention, MQA, GQA
✔️ Sinusoidal Positional Encoding and RoPE
✔️ ReLU, GeLU, and SwiGLU activations
✔️ layer, batch and RMS norms
✔️ DDP and data parallel distributions
✔️ Checkpointing
✔️ Confirugable Transformer blocks - all in simple YAML!

More to come!

Setup

The repo is set up with uv package manager, the necessary dependencies needed to work within the repo are outlined in pyproject.toml 'dependencies' section.

If you are using uv, you can set up a virtual environment and install the required libraries.

uv venv

source .venv/bin/activate

uv pip install -e .

This should activate your virtual environment and install all the needed dependencies.

Usage

The entry point for the repo is defined in src/train. There is a click CLI defined, or you can use the predefined script as outlined below:

train --model-config-path ./configs/models/your_config.yaml  \
      --training-config-path ./configs/training/your_training_config.yaml \
      --data-path ./data/your_data

For convenience, the tiny shakespeare data set is already provided. Additionally, there is a default training config with parameters, and a model card config that defines the tiny GPT2 model.

When you run this, the system will automatically figure out what compute you have available, and implement the distributed strategy if relevant.

The configs/models/ directory should contain yaml defined decoder only LLM models. The interesting part here is that you can define not only the components that make up your transformer, but also the transformer sequencing itself by creating the transformer_block like so:

transformer_block:
- norm
- attention
- skip
- norm
- feed_forward
- skip

This provides a very convenient wrapper to configuring your transformer block, to easily adapt to new variants with the existing components defined in the config as well.

Planned Work

FSDP support for parallel GPU training
Support for datasets on S3
Adding in UI support for tracking your experiments [Think a simplified W&B all local]
Inference libraries for the pretrained models

Reach out to me 📫 at [email protected] if you'd like to collaborate or contribute.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
configs		configs
data		data
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quick-trainer

Table of Contents

Setup

Usage

Planned Work

About

Releases

Packages

Languages

License

ncorriveau/transformers

Folders and files

Latest commit

History

Repository files navigation

quick-trainer

Table of Contents

Setup

Usage

Planned Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages