Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

Added

Added support for Grouped Query Attention.
Added commonsense_qa and social_iqa downstream evaluation tasks
Added MMLU multiple choice (A/B/C/D) 5-shot variant downstream tasks

Changed

Rename Olmo to OLMo everywhere in the codebase
Disabled automatic garbage collection during training, instead we run manually at regular intervals to avoid ranks getting out-of-sync with their own gc.

Removed

Removed AMDLayerNorm, since the original layer norm bug has been fixed and we don't need this workaround anymore.
Removed OLMoParallelBlock.

Fixed

Don't log garbage on nodes that aren't rank 0
Don't crash in the HF code when we are referring to a tokenizer in a local file
Fixed the size calculation for qk layer norm

v0.2.5 - 2024-03-06

Fixed

Fixed default value of --tokenizer argument to scripts/prepare_tulu_data.py to be an absolute path, not relative path, the script can be run from other directories.
Added the option to directly pass input embeddings to OLMo and OLMoForCausalLM.
Added support for Python 3.8.
Added code to throw an error if output_attentions is set to True in forward call to OLMoForCausalLM. This functionality hasn't been implemented yet.
Correct scheme displayed in error messages that come from R2
Fixed running with multiple data loading workers in LUMI
Minor bug fix: uninitialized prompts variable

Added

Added output_hidden_states argument and associated functionality to OLMo and OLMoForCausalLM to return model intermediate hidden states.
Ability to read from R2 like we read from S3
Added MMLU downstream evaluation tasks, with prompt variations.
Added support for PyTorch v2.2.
Added ability to show logs from all ranks
Added option for QKV clipping.
Added basic_arithmetic downstream evaluation task

Changed

Changed legacy checkpoint unsharding to use processes and shared memory instead of threads

v0.2.4 - 2024-02-02

Fixed

Fixed an issue with the HuggingFace integration where we were inadvertently using a feature that was introduced in Python 3.10, causing an error for older Python versions.

v0.2.3 - 2024-01-31

v0.2.2 - 2023-12-10

v0.2.1 - 2023-12-10

v0.2.0 - 2023-12-08

Added

GPT-based model.
Tokenizer and data pre-processing pipeline.
training script.
Triton-based FlashAttention.