- Performance gains in generation:
notebooks/Speedup Profiling Submission.ipynb
- Accuracy evaluation on Wikitext:
notebooks/wikitext_PPL_calculations.py
- Accuracy evaluation on C4:
notebooks/C4_PPL_calculations.py
All our code can be run on PACE.
- Connect to gatech VPN.
- Access
https://ondemand-ice.pace.gatech.edu/
and request and get a RHEL9 Interactive Desktop (from the top drop down) with H100. - Enter the VM's GUI and open up a terminal.
- After setting up your git credentials, run
git clone [email protected]:abhibambhaniya/mixtral-offloading-residency-info.git
cd mixtral-offloading-residency-info
bash initial_setup.sh
conda activate $TMP_DIR/moe-offload
jupyter notebook
cd notebooks
huggingface-cli download lavawolfiee/Mixtral-8x7B-Instruct-v0.1-offloading-demo --quiet --cache-dir $TMP_DIR --local-dir Mixtral-8x7B-Instruct-v0.1-offloading-demo
- For performance gain results: Open up the speed up notebook and GO! :D
- For quality results on wikitext/C4, run the respective python script. Ensure that step 10 completed successfully and verify that the local-dir matches the
state_path
in the script.
Repo MoE_Expert_Scheduler
(huggingface/transformers
fork)
- Mixtral and Switch Transformer changes to collect router logits
- Changes to generation utils to return required data
Repo mixtral-offloading-residency-info
(dvmazur/mixtral-offloading
fork)
- Implementation of biasing and thresholding