Mixtral8x7b Demo

How to Run

Download the weights and repack

Download the weights from Huggingface.

Instruct weights
General weights

Repack the weights using the provided script in models/demos/t3000/mixtral8x7b/scripts/repack_weights.py. It requires the consolidated weights from Huggingface to be inside <path_to_checkpoint_dir>.

# This separates the 8 experts to facilitate sending them to multiple devices.
python models/demos/t3000/mixtral8x7b/scripts/repack_weights.py <path_to_checkpoint_dir> <repacked_output_dir>

Set up environment

Set async and dispatch over ethernet cores env vars:

export TT_METAL_ASYNC_DEVICE_QUEUE=1
export WH_ARCH_YAML=wormhole_b0_80_arch_eth_dispatch.yaml

Prepare the weight cache directory:

# Make a directory for ttnn to cache weights into. This speeds up subsequent runs.
mkdir <weight_cache_dir>

Set up environment variables:

export MIXTRAL_CKPT_DIR=<repacked_output_dir>
export MIXTRAL_TOKENIZER_PATH=<path_to_tokenizer_dir>
export MIXTRAL_CACHE_PATH=<weights_cache_dir>

Note that the cached weights folder structure will contain the general and instruct cached weights in separate directories, like so:

<weights_cache_dir>
  /mixtral_tensor_cache_bfp8
  /mixtral_tensor_cache_instruct_bfp8
  ...

Cache the weights (first-time setup):

# Build a full 32 layer model to cache the weights. This will take some time.
pytest -svv models/demos/t3000/mixtral8x7b/tests/test_mixtral_model.py::test_mixtral_model_inference[wormhole_b0-True-1-32-output]

Run the demo

Mixtral prefill support is now available. We include two different demos: a decode-only mode where the prompts are decoded token-by-token and force-pushed until the user starts generating; and a prefill&decode mode, where the KV-caches are first prefilled for the prompt length (e.g. 128 tokens) and then decode as normal.

# Run the demo with a pre-written batch of 32 user prompts

# Prefill & Decode demo
pytest -svv models/demos/t3000/mixtral8x7b/demo/demo_with_prefill.py::test_mixtral8x7b_demo[wormhole_b0-True-general_weights]

# Decode-only demo
pytest -svv models/demos/t3000/mixtral8x7b/demo/demo.py::test_mixtral8x7b_demo[wormhole_b0-True-general_weights]

We also provide an input file with 32 user question-prompt for instruct weights (don't forget to update your flags to the correct weights!):

# Prefill & Decode demo
pytest -svv models/demos/t3000/mixtral8x7b/demo/demo_with_prefill.py::test_mixtral8x7b_demo[wormhole_b0-True-instruct_weights]

# Decode-only demo
pytest -svv models/demos/t3000/mixtral8x7b/demo/demo.py::test_mixtral8x7b_demo[wormhole_b0-True-instruct_weights]

Both input files are provided inside models/demos/t3000/mixtral8x7b/demo/.

If you wish you to run the model using a different set of input prompts you can provide a different path to pytest inside the demo code. Keep in mind that for the instruct-mode, the prompts are automatically prefixed and suffixed by [INST] and [/INST], respectively, so there's no need to add them to your file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Mixtral8x7b Demo

How to Run

Download the weights and repack

Set up environment

Run the demo

Files

README.md

Latest commit

History

README.md

File metadata and controls

Mixtral8x7b Demo

How to Run

Download the weights and repack

Set up environment

Run the demo