Skip to content

Latest commit

 

History

History
78 lines (58 loc) · 3.14 KB

README.md

File metadata and controls

78 lines (58 loc) · 3.14 KB

Mixtral8x7b Demo

How to Run

Download the weights and repack

  1. Download the weights from Huggingface.
  1. Repack the weights using the provided script in models/demos/t3000/mixtral8x7b/scripts/repack_weights.py. It requires the consolidated weights from Huggingface to be inside <path_to_checkpoint_dir>.
# This separates the 8 experts to facilitate sending them to multiple devices.
python models/demos/t3000/mixtral8x7b/scripts/repack_weights.py <path_to_checkpoint_dir> <repacked_output_dir>

Set up environment

  1. Set async and dispatch over ethernet cores env vars:
export TT_METAL_ASYNC_DEVICE_QUEUE=1
export WH_ARCH_YAML=wormhole_b0_80_arch_eth_dispatch.yaml
  1. Prepare the weight cache directory:
# Make a directory for ttnn to cache weights into. This speeds up subsequent runs.
mkdir <weight_cache_dir>
  1. Set up environment variables:
export MIXTRAL_CKPT_DIR=<repacked_output_dir>
export MIXTRAL_TOKENIZER_PATH=<path_to_tokenizer_dir>
export MIXTRAL_CACHE_PATH=<weights_cache_dir>

Note that the cached weights folder structure will contain the general and instruct cached weights in separate directories, like so:

<weights_cache_dir>
  /mixtral_tensor_cache_bfp8
  /mixtral_tensor_cache_instruct_bfp8
  ...
  1. Cache the weights (first-time setup):
# Build a full 32 layer model to cache the weights. This will take some time.
pytest -svv models/demos/t3000/mixtral8x7b/tests/test_mixtral_model.py::test_mixtral_model_inference[wormhole_b0-True-1-32-output]

Run the demo

Mixtral prefill support is now available. We include two different demos: a decode-only mode where the prompts are decoded token-by-token and force-pushed until the user starts generating; and a prefill&decode mode, where the KV-caches are first prefilled for the prompt length (e.g. 128 tokens) and then decode as normal.

# Run the demo with a pre-written batch of 32 user prompts

# Prefill & Decode demo
pytest -svv models/demos/t3000/mixtral8x7b/demo/demo_with_prefill.py::test_mixtral8x7b_demo[wormhole_b0-True-general_weights]

# Decode-only demo
pytest -svv models/demos/t3000/mixtral8x7b/demo/demo.py::test_mixtral8x7b_demo[wormhole_b0-True-general_weights]

We also provide an input file with 32 user question-prompt for instruct weights (don't forget to update your flags to the correct weights!):

# Prefill & Decode demo
pytest -svv models/demos/t3000/mixtral8x7b/demo/demo_with_prefill.py::test_mixtral8x7b_demo[wormhole_b0-True-instruct_weights]

# Decode-only demo
pytest -svv models/demos/t3000/mixtral8x7b/demo/demo.py::test_mixtral8x7b_demo[wormhole_b0-True-instruct_weights]

Both input files are provided inside models/demos/t3000/mixtral8x7b/demo/.

If you wish you to run the model using a different set of input prompts you can provide a different path to pytest inside the demo code. Keep in mind that for the instruct-mode, the prompts are automatically prefixed and suffixed by [INST] and [/INST], respectively, so there's no need to add them to your file.