Repository for HW-GPT-Bench- NeurIPS DBT 2024
Note: We are in the process of updating the benchmark and code, with significant changes to the repository coming soon!
We release the pretrained supernet checkpoints here, the pretrained hardware surrogates here and the perplexity surrogates here and the list of architectures sampled here. If you want to install minimal requirements use the requirements-mix.txt
. Installing the full requirements requires building some pacages on a GPU node (with CUDA module loaded).
$ git clone https://github.com/automl/HW-Aware-LLM-Bench
$ cd HW-Aware-LLM-Bench
$ conda create -n hw-gpt python=3.11.9
$ conda activate hw-gpt
$ pip install -e .
To install syne-tune use the following steps
git clone https://github.com/awslabs/syne-tune.git
cd syne-tune
pip install -e '.[basic]'
from hwgpt.api import HWGPT
api = HWGPT(search_space="s",use_supernet_surrogate=False) # initialize API
random_arch = api.sample_arch() # sample random arch
api.set_arch(random_arch) # set arch
results = api.query() # query all for the sampled arch
print("Results: ", results)
energy = api.query(metric="energies") # query energy
print("Energy: ", energy)
rtx2080 = api.query(device="rtx2080") # query device
print("RTX2080: ", rtx2080)
# query perplexity based on mlp predictor
perplexity_mlp = api.query(metric="perplexity",predictor="mlp")
print("Perplexity MLP: ", perplexity_mlp)
If you find HW-GPT Bench useful, you can cite us using:
@article{sukthanker2024hw,
title={HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models},
author={Sukthanker, Rhea Sanjay and Zela, Arber and Staffler, Benedikt and Klein, Aaron and Franke, Jorg KH and Hutter, Frank},
journal={arXiv preprint arXiv:2405.10299},
year={2024}
}