You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the examples/ppo_trainer/run_deepseek_megatron.sh script with the base model deepseek-llm-7b-chat, I encountered an unexpected behavior related to the num_hidden_layers parameter. Originally, the model has num_hidden_layers set to 30, and the rollout time is approximately 35 seconds. I modified num_hidden_layers to 15, anticipating that the rollout time would roughly halve. However, the rollout time instead increased to about 71 seconds.
Steps to Reproduce
Original Configuration:
Run the script examples/ppo_trainer/run_deepseek_megatron.sh with the base model deepseek-llm-7b-chat having num_hidden_layers=30.
Observe the rollout time, which is approximately 35 seconds.
Modified Configuration:
Change the num_hidden_layers parameter in the model configuration from 30 to 15.
Rerun the same script with the modified configuration.
Notice that the rollout time increases to approximately 71 seconds instead of decreasing.
Expected Behavior
Reducing the num_hidden_layers from 30 to 15 should lead to a proportional decrease in rollout generation time, ideally halving the time from around 35 seconds to approximately 17-18 seconds.
Actual Behavior
After modifying num_hidden_layers to 15, the rollout time unexpectedly doubled from ~35 seconds to ~71 seconds.
What could be causing the rollout time to increase when reducing the num_hidden_layers from 30 to 15 in the deepseek-llm-7b-chat model? Are there any configuration or implementation issues that might lead to this performance degradation?
The text was updated successfully, but these errors were encountered:
I'm not sure why this will happen. I wonder how you modify the num_hidden_layers? Do you make sure that both the vllm model and megatron model configuration are modified correctly?
I have three suggestions for debugging:
You may also investigate the number of GPU blocks from vLLM logging in the two configurations to check whether their allocated KVCache are different.
Test this setting using the official vLLM offline inference script
Make sure that you have clean up GPU mem in different runs
Description
When running the
examples/ppo_trainer/run_deepseek_megatron.sh
script with the base modeldeepseek-llm-7b-chat
, I encountered an unexpected behavior related to thenum_hidden_layers
parameter. Originally, the model hasnum_hidden_layers
set to 30, and the rollout time is approximately 35 seconds. I modifiednum_hidden_layers
to 15, anticipating that the rollout time would roughly halve. However, the rollout time instead increased to about 71 seconds.Steps to Reproduce
Original Configuration:
examples/ppo_trainer/run_deepseek_megatron.sh
with the base modeldeepseek-llm-7b-chat
havingnum_hidden_layers=30
.Modified Configuration:
num_hidden_layers
parameter in the model configuration from 30 to 15.Expected Behavior
Reducing the
num_hidden_layers
from 30 to 15 should lead to a proportional decrease in rollout generation time, ideally halving the time from around 35 seconds to approximately 17-18 seconds.Actual Behavior
After modifying
num_hidden_layers
to 15, the rollout time unexpectedly doubled from ~35 seconds to ~71 seconds.Additional Information
Model Structure (after reducing):
Timing Code:
Question
What could be causing the rollout time to increase when reducing the
num_hidden_layers
from 30 to 15 in thedeepseek-llm-7b-chat
model? Are there any configuration or implementation issues that might lead to this performance degradation?The text was updated successfully, but these errors were encountered: