MoRe Fine-Tuning with 10x Fewer Parameters

Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters". Using hardware efficient block-diagonal matrices, we surpass LoRA's performance with 10x fewer parameters on average, very little rank tuning and no alpha scaler. Our approach is also more memory and runtime efficient on standard reasoning tasks, with our Llama 7B beating LoRA-trained Llama 13B on Commonsense reasoning.

Paper: https://arxiv.org/abs/2408.17383

🚀 Main Results

📰 News

[2024/06] Our paper is accepted by two ICML workshops: ES-FOMO and FM-wild!
[2024/10] Release an optimized Triton Kernel and further optimizations for the PyTorch version!

🗝️ MoRe Implementation

src/models/layers/monarch_linear.py contains the code for MoRe adapter.
src/models/layers/blockdiag_butterfly_multiply.py contains the code for block-diagonal monarch matmul, which is optimized over Hazy Research's version for less transpose and mem copy.
a basic usage of MoRe adapter can be found in demos/monarch_demo.ipynb.

🛠️ Setup

We highly recommend using docker for stable environment building, but any decent PyTorch + Huggingface environment should work.

docker compose build; docker compose up -d
hostname > hostname.txt # to not confuse machines in wandb
docker attach peft

🦾 Usage

To reproduce results

GLUE tasks: python run_glue.py /fly/task_configs/monarch_roberta_glue/cola.json.\
For reasoning tasks, first load datasets using bash pyreft/loreft/load_datasets.sh.
Math reasoning: cd pyreft/loreft; bash math_monarch.sh --blk_r=8 --all_linear.
Commonsense reasoning: cd pyreft/loreft; bash common_sense_monarch.sh --blk_r=8 --all_linear.

You can manually modify the hyperparameters in task_configs/llama for reasoning tasks and task_configs/monarch_roberta_glue for GLUE tasks.

🧐 Todo

We welcome contributions and suggestions to the list!

Fused Triton kernel for Monarch
MMLU results (including reproducing the QLoRA baselines)
More ablations on rank tuning guidelines
Explore MoRe as a general substitute for low-rank modules.

📑 Citation

If you use our adapter implementation, please cite our paper:

@misc{tan2024finetuning10xfewerparameters,
      title={MoRe Fine-Tuning with 10x Fewer Parameters},
      author={Wenxuan Tan and Nicholas Roberts and Tzu-Heng Huang and Jitian Zhao and John Cooper and Samuel Guo and Chengyu Duan and Frederic Sala},
      year={2024},
      eprint={2408.17383},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2408.17383},
}

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
agnews_train_scripts		agnews_train_scripts
demos		demos
glue_scripts		glue_scripts
pyreft		pyreft
qlora		qlora
results		results
src		src
task_configs		task_configs
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
run_glue.py		run_glue.py
set_least_n_gpu.sh		set_least_n_gpu.sh
train_utils.py		train_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoRe Fine-Tuning with 10x Fewer Parameters

🚀 Main Results

📰 News

🗝️ MoRe Implementation

🛠️ Setup

🦾 Usage

To reproduce results

🧐 Todo

📑 Citation

About

Releases

Packages

Contributors 3

Languages

SprocketLab/sparse_matrix_fine_tuning

Folders and files

Latest commit

History

Repository files navigation

MoRe Fine-Tuning with 10x Fewer Parameters

🚀 Main Results

📰 News

🗝️ MoRe Implementation

🛠️ Setup

🦾 Usage

To reproduce results

🧐 Todo

📑 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages