From 94c7f2c5056fa9fc482759ec9aec6f8611b2c5b3 Mon Sep 17 00:00:00 2001 From: Miles Cranmer Date: Fri, 26 Jan 2024 00:33:38 +0900 Subject: [PATCH] Fix `max_memory` example on README (#944) * Fix `max_memory` example on README - The new `max_memory` syntax expects a dictionary - This change also accounts for multiple devices * Fix model name in `from_pretrained` on README --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 5cf92dcc5..a4586d6ca 100644 --- a/README.md +++ b/README.md @@ -41,7 +41,11 @@ model = AutoModelForCausalLM.from_pretrained( 'decapoda-research/llama-7b-hf', device_map='auto', load_in_8bit=True, - max_memory=f'{int(torch.cuda.mem_get_info()[0]/1024**3)-2}GB') + max_memory={ + i: f'{int(torch.cuda.mem_get_info(i)[0]/1024**3)-2}GB' + for i in range(torch.cuda.device_count()) + } +) ``` A more detailed example, can be found in [examples/int8_inference_huggingface.py](examples/int8_inference_huggingface.py).