FMS Acceleration for Attention And Distributed Packing Plugin

This library contains plugins to accelerate finetuning with the following optimizations:

Padding-Free Flash Attention Computation
Multipack Distributed Sampling

Plugins

Plugin	Description	Depends	Loading	Augmentation	Callbacks
padding_free	Padding-Free Flash Attention Computation	flash_attn		✅
multipack sampler	Multipack Distributed Sampling	numba		✅

Native Transformers Support from v4.44.0

Transformers natively supports padding-free from v4.44.0 see here. The padding-free plugin will use the transformers library if compatible, otherwise if transformers < v4.44.0 the plugin will use an internal implementation instead.

Running Benchmarks

To reproduce the benchmarks, simply run the following commands,

Reproduce Padding Free on A100 80GB tox -e run-benches -- "1 2" "4 8" benchmark_outputs scenarios-orca.yaml "none"

Reproduce MultiPack on A100 80GB tox -e run-benches -- "2 4 8" "16 32 64" benchmark_outputs scenarios-orca.yaml "padding-free"

Known Issues

Currently Only Supports Pre-Tokenized Dataset

The padding-free plugin currently only works with pre-tokenized datasets, this is because it is currently designed to replace the data collator from SFTTrainer with a custom data collator to manipulate the input to the modified flash attention forward.

There are some cases, the data collator for SFTTrainer will handle the formatting and tokenization from raw text datasets. The plugin is currently unable to both handle the original data collation and apply its custom data collator over it as the same time. This issue will be addressed in a future commit to support this case.

In the meantime, the plugin expects the user to provide a pretokenized dataset that

is formatted with a template for instruct-tuning cases
is tokenized
has template labels that are masked to exclude from loss computation
has eos token appended

Currenly Only Supports Multipack with Padding-Free

The multipack plugin currently also requires the padding-free plugin to work. This may change in the future if there is demand for multipack to work standalone without padding free.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FMS Acceleration for Attention And Distributed Packing Plugin

Plugins

Native Transformers Support from v4.44.0

Running Benchmarks

Known Issues

Currently Only Supports Pre-Tokenized Dataset

Currenly Only Supports Multipack with Padding-Free

Files

README.md

Latest commit

History

README.md

File metadata and controls

FMS Acceleration for Attention And Distributed Packing Plugin

Plugins

Native Transformers Support from v4.44.0

Running Benchmarks

Known Issues

Currently Only Supports Pre-Tokenized Dataset

Currenly Only Supports Multipack with Padding-Free