From e99901a7927a4e30a3e42fecd698a72a16f4340c Mon Sep 17 00:00:00 2001
From: mtairum <mtairum@tenstorrent.com>
Date: Tue, 29 Oct 2024 12:14:36 +0000
Subject: [PATCH] #13368: Update llama3 readme to include 11B and 70B

---
 models/demos/llama3/README.md | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/models/demos/llama3/README.md b/models/demos/llama3/README.md
index 998b1456d731..bb5c99627e5a 100644
--- a/models/demos/llama3/README.md
+++ b/models/demos/llama3/README.md
@@ -3,11 +3,13 @@
 This codebase includes the Llama3 family of models.
 
 The current version supports the following Llama3 models:
-- Llama3.1-8B
 - Llama3.2-1B
 - Llama3.2-3B
+- Llama3.1-8B
+- Llama3.2-11B
+- Llama3.1-70B (T3000-only)
 
-All the above llama models are compatible and tested on the following Tenstorrent hardware:
+All the above llama models (with the exception of 70B due to its large size) are compatible and tested on the following Tenstorrent hardware:
 - N150 (1-chip)
 - N300 (2-chips)
 - T3000 (8-chips)
@@ -20,6 +22,20 @@ Download the weights [directly from Meta](https://llama.meta.com/llama-downloads
 
 The downloaded directories include weight files (e.g. `consolidated.00.pth`), the tokenizer `tokenizer.model` and configuration file `params.json`.
 
+**For Llama3.1-70B only**, it is required to repack the weights. We provide a script to facilitate this in `models/demos/llama3/scripts/repack_weights_70b.py`.
+
+The repacked output directory can be same as the checkpoint directory, since the new files will have different names.
+If providing a different path, please make sure that you keep the string `3.1-70B` in the new path name, since the Llama3 codebase relies on the weights directory name to identify the correct model.
+
+Note: Use the value of `5` for `chunk_size`.
+
+```
+# This concatenates the sharded checkpoints and makes it easier for us to load.
+python models/demos/llama3/scripts/repack_weights_70b.py <path_to_checkpoint_dir> <repacked_output_dir> <chunk_size=5>
+```
+
+If providing a different output directory, please copy the `params.json` and the `tokenizer.model` files to the new directory.
+
 ### Setup TT environment
 
 1. Set up environment variables:
@@ -57,6 +73,6 @@ When running the demo, do not forget to setup the `$LLAMA_DIR` environment varia
 # Run a single continuous batch with instruct weights
 pytest models/demos/llama3/demo/demo.py -k 'instruct and 1_batch'
 
-# Run 3 continuous batches with general weights
-pytest models/demos/llama3/demo/demo.py -k 'general and 3_batch'
+# Run 2 continuous batches with general weights
+pytest models/demos/llama3/demo/demo.py -k 'general and 2_batch'
 ```