All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Added support for Grouped Query Attention.
- Added commonsense_qa and social_iqa downstream evaluation tasks
- Added MMLU multiple choice (A/B/C/D) 5-shot variant downstream tasks
- Rename
Olmo
toOLMo
everywhere in the codebase - Disabled automatic garbage collection during training, instead we run manually at regular intervals to avoid ranks getting out-of-sync with their own gc.
- Removed
AMDLayerNorm
, since the original layer norm bug has been fixed and we don't need this workaround anymore. - Removed
OLMoParallelBlock
.
- Don't log garbage on nodes that aren't rank 0
- Don't crash in the HF code when we are referring to a tokenizer in a local file
- Fixed the size calculation for qk layer norm
v0.2.5 - 2024-03-06
- Fixed default value of
--tokenizer
argument toscripts/prepare_tulu_data.py
to be an absolute path, not relative path, the script can be run from other directories. - Added the option to directly pass input embeddings to
OLMo
andOLMoForCausalLM
. - Added support for Python 3.8.
- Added code to throw an error if
output_attentions
is set toTrue
in forward call toOLMoForCausalLM
. This functionality hasn't been implemented yet. - Correct scheme displayed in error messages that come from R2
- Fixed running with multiple data loading workers in LUMI
- Minor bug fix: uninitialized prompts variable
- Added
output_hidden_states
argument and associated functionality toOLMo
andOLMoForCausalLM
to return model intermediate hidden states. - Ability to read from R2 like we read from S3
- Added MMLU downstream evaluation tasks, with prompt variations.
- Added support for PyTorch v2.2.
- Added ability to show logs from all ranks
- Added option for QKV clipping.
- Added basic_arithmetic downstream evaluation task
- Changed legacy checkpoint unsharding to use processes and shared memory instead of threads
v0.2.4 - 2024-02-02
- Fixed an issue with the HuggingFace integration where we were inadvertently using a feature that was introduced in Python 3.10, causing an error for older Python versions.
v0.2.3 - 2024-01-31
v0.2.2 - 2023-12-10
v0.2.1 - 2023-12-10
v0.2.0 - 2023-12-08
- GPT-based model.
- Tokenizer and data pre-processing pipeline.
- training script.
- Triton-based FlashAttention.