[Llama3-text vLLM integration] Modify Llama3 text model (new and old codebase) forward apis for vLLM compatibility #16292

skhorasganiTT · 2024-12-23T18:41:24Z

Ticket

N/A

Problem description

The forward call interface for the llama3 text models was incompatible with vLLM and had some minor bugs

What's changed

Added generator_vllm.py::TtLlamaForCausalLM for vLLM model init and execution
Added _easy_trace_text (handles capture trace and decode forward trace automatically for user) to LlamaGenerator and modified decode_forward_text (only valid decode entry point) to either call _easy_trace_text or _decode_forward_no_trace_text depending on the enable_trace arg.
Added read_from_device arg to decode_forward_text so vLLM can perform async output processing during decode execution and modified _easy_trace_text and _decode_forward_no_trace_text to not read back outputs
Made same modifications to old llama70b codebase to remove old codepaths in vLLM
Modified llama_common.py::get_padded_prefill_len to pad to 128 if seq len < 128 since that is the min required for llama3 attention (same as currently done for llama3 demo and vision model)
Note: this PR is required to enable Add support for TT Llama3 text models (1B,3B,8B,70B-new) vllm#48

Checklist

Post commit CI passes
Blackhole Post commit (if applicable)
Model regression CI testing passes (if applicable)
Device performance regression CI testing passes (if applicable)
(For models and ops writers) Full new models tests passes
New/Existing tests provide coverage for changes

skhorasganiTT · 2024-12-23T18:43:08Z

T3K unit/demo/freq tests: https://github.com/tenstorrent/tt-metal/actions/runs/12471128804
Single-card demo tests: https://github.com/tenstorrent/tt-metal/actions/runs/12471132832
TG unit/freq tests: https://github.com/tenstorrent/tt-metal/actions/runs/12471141537
TG demo tests: https://github.com/tenstorrent/tt-metal/actions/runs/12473148786
All post-commit tests: https://github.com/tenstorrent/tt-metal/actions/runs/12471164501

Signed-off-by: Salar Hosseini <[email protected]> (cherry picked from commit f4e1e9a)

Signed-off-by: Salar Hosseini <[email protected]>

…b and match forward calls with new llama3 Signed-off-by: Salar Hosseini <[email protected]>

…ruction to llama3 Signed-off-by: Salar Hosseini <[email protected]>

Signed-off-by: Salar Hosseini <[email protected]>

Signed-off-by: Salar <[email protected]>

skhorasganiTT requested review from cglagovichTT, uaydonat, johanna-rock-tt, djordje-tt, kpaigwar, yieldthought and mtairum as code owners December 23, 2024 18:41

skhorasganiTT mentioned this pull request Dec 24, 2024

Add support for TT Llama3 text models (1B,3B,8B,70B-new) tenstorrent/vllm#48

Merged

uaydonat approved these changes Dec 24, 2024

View reviewed changes

skhorasganiTT added 7 commits December 24, 2024 17:32

text model integration wip 1 - no trace

d2a9b62

Signed-off-by: Salar Hosseini <[email protected]> (cherry picked from commit f4e1e9a)

text model integratip wip 2 - trace

9175b1d

Signed-off-by: Salar Hosseini <[email protected]>

text model integration wip 3 - add easy trace function to old llama70…

ba0c78a

…b and match forward calls with new llama3 Signed-off-by: Salar Hosseini <[email protected]>

text model integration wip 4 - add trace deletion upon generator dest…

4401604

…ruction to llama3 Signed-off-by: Salar Hosseini <[email protected]>

Add ring mesh shape assertionf for old llama70b vllm init

0d19b4a

Signed-off-by: Salar Hosseini <[email protected]>

Fix minor bugs with output processing and get_rot_idxs

2fe5001

Signed-off-by: Salar Hosseini <[email protected]>

Fix minor bug with token input processing for galaxy

27249e2

Signed-off-by: Salar <[email protected]>

skhorasganiTT force-pushed the skhorasgani/vllm_llama3_common2 branch from 38bc926 to 27249e2 Compare December 24, 2024 22:32

skhorasganiTT merged commit 9f24a71 into main Dec 24, 2024
9 checks passed

skhorasganiTT deleted the skhorasgani/vllm_llama3_common2 branch December 24, 2024 22:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Llama3-text vLLM integration] Modify Llama3 text model (new and old codebase) forward apis for vLLM compatibility #16292

[Llama3-text vLLM integration] Modify Llama3 text model (new and old codebase) forward apis for vLLM compatibility #16292

skhorasganiTT commented Dec 23, 2024 •

edited

Loading

skhorasganiTT commented Dec 23, 2024 •

edited

Loading

[Llama3-text vLLM integration] Modify Llama3 text model (new and old codebase) forward apis for vLLM compatibility #16292

[Llama3-text vLLM integration] Modify Llama3 text model (new and old codebase) forward apis for vLLM compatibility #16292

Conversation

skhorasganiTT commented Dec 23, 2024 • edited Loading

Ticket

Problem description

What's changed

Checklist

skhorasganiTT commented Dec 23, 2024 • edited Loading

skhorasganiTT commented Dec 23, 2024 •

edited

Loading

skhorasganiTT commented Dec 23, 2024 •

edited

Loading