Basic Tutorial: Adding a New LLM Inference/Serving Backend #21

PeterSH6 · 2024-11-22T08:25:13Z

Prerequisite: Make sure the LLM Inference framework can be launched following the SPMD style. For example, the LLM inference script can be launched by torchrun --standalone --nproc=8 offline_inference.py
A Rollout class: Build a xxx_rollout.py script similar to vllm_rollout.py. In this file, define a xxxRollout class that inherits from BaseRollout.
1. This class should have a generate_sequence API that accepts a batch of input_ids, response_masks, and position_ids from the DataProto as input. The self.inference_engine (e.g., LLMEngine in vLLM) is responsible for performing auto-regressive generation and outputting a batch of responses. These responses should then be concatenated with input_ids, and the response_masks and position_ids should be reconstructed accordingly.
ShardingManager Classes for Weight Synchronization with Training Frameworks: Create files named fsdp_xxx.py and megatron_xxx.py, similar to fsdp_vllm.py and megatron_vllm.py. These files should define XXXShardingManager classes (i.e., HybridEngine) that handle weight sharding between the training and inference frameworks.
1. In megatron_vllm.py, we define an AllGatherPPModel class to collect weights across the pipeline parallelism dimension. The parameters stored in the memory_buffers of AllGatherPPModel will be used to synchronize the weights with the models in the vLLM rollout.
Weight loading issues: It may be necessary to provide specific weight loaders for transferring weights between different LLM Inference and Training backends for each model. This is similar to the dtensor_weight_loader.py and megatron_weight_loader.py files in vLLM.

The text was updated successfully, but these errors were encountered:

eelxpeng · 2024-11-28T16:32:07Z

Could you describe what primary changes you have to make in verl/third_party/vllm/ assuming that most of the code in the directory are from vllm code. If we can somehow simplify the dependency with vllm, it would be a lot easier to upgrade to higher version of vllm.

PeterSH6 added enhancement New feature or request generation labels Nov 22, 2024

PeterSH6 changed the title ~~Basic Tutorial of Adding New LLM Inference/Serving Backend~~ Basic Tutorial: Adding a New LLM Inference/Serving Backend Nov 22, 2024

PeterSH6 mentioned this issue Nov 22, 2024

[Roadmap] veRL Development Roadmap #22

Open

32 tasks

PeterSH6 pinned this issue Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic Tutorial: Adding a New LLM Inference/Serving Backend #21

Basic Tutorial: Adding a New LLM Inference/Serving Backend #21

PeterSH6 commented Nov 22, 2024

eelxpeng commented Nov 28, 2024

Basic Tutorial: Adding a New LLM Inference/Serving Backend #21

Basic Tutorial: Adding a New LLM Inference/Serving Backend #21

Comments

PeterSH6 commented Nov 22, 2024

eelxpeng commented Nov 28, 2024