Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Llama3-text vLLM integration] Modify Llama3 text model (new and old codebase) forward apis for vLLM compatibility #16292

Merged
merged 7 commits into from
Dec 24, 2024

Conversation

skhorasganiTT
Copy link
Contributor

@skhorasganiTT skhorasganiTT commented Dec 23, 2024

Ticket

N/A

Problem description

  • The forward call interface for the llama3 text models was incompatible with vLLM and had some minor bugs

What's changed

  • Added generator_vllm.py::TtLlamaForCausalLM for vLLM model init and execution
  • Added _easy_trace_text (handles capture trace and decode forward trace automatically for user) to LlamaGenerator and modified decode_forward_text (only valid decode entry point) to either call _easy_trace_text or _decode_forward_no_trace_text depending on the enable_trace arg.
  • Added read_from_device arg to decode_forward_text so vLLM can perform async output processing during decode execution and modified _easy_trace_text and _decode_forward_no_trace_text to not read back outputs
  • Made same modifications to old llama70b codebase to remove old codepaths in vLLM
  • Modified llama_common.py::get_padded_prefill_len to pad to 128 if seq len < 128 since that is the min required for llama3 attention (same as currently done for llama3 demo and vision model)
  • Note: this PR is required to enable Add support for TT Llama3 text models (1B,3B,8B,70B-new) vllm#48

Checklist

  • Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • Device performance regression CI testing passes (if applicable)
  • (For models and ops writers) Full new models tests passes
  • New/Existing tests provide coverage for changes

@skhorasganiTT skhorasganiTT force-pushed the skhorasgani/vllm_llama3_common2 branch from 38bc926 to 27249e2 Compare December 24, 2024 22:32
@skhorasganiTT skhorasganiTT merged commit 9f24a71 into main Dec 24, 2024
9 checks passed
@skhorasganiTT skhorasganiTT deleted the skhorasgani/vllm_llama3_common2 branch December 24, 2024 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants