Skip to content

Commit

Permalink
Fix runtime error when Qwen2-VL was prompted with multiple images
Browse files Browse the repository at this point in the history
Fix runtime error when Qwen2-VL model is prompted with prompt with more
than one image. The runtime error was:

 File "text-generation-inference/server/text_generation_server/models/custom_modeling/qwen2_vl.py", line 459, in get_position_ids
    text_pos_ids = torch.arange(text_length, device=d)
RuntimeError: upper bound and larger bound inconsistent with step sign

The error was caused by text_length variable going to negative value
when multiple images caused multiple loops in the get_position_ids
function's main loop.

The error is a simple logic mistake where next_image_pos is initialized
as relative offset from current_pos, but was used like it was absolute
position from zero.
  • Loading branch information
alatja authored and drbh committed Dec 16, 2024
1 parent 11ab329 commit e14221b
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -450,7 +450,7 @@ def get_position_ids(
width //= self.spatial_merge_size

# calculate the length of the text and image tokens
text_length = next_image_pos - current_pos
text_length = next_image_pos
start_idx = (
llm_pos_ids_list[-1].max() + 1 if llm_pos_ids_list else 0
)
Expand Down Expand Up @@ -480,7 +480,7 @@ def get_position_ids(
)
llm_pos_ids_list.append(image_pos_ids)

current_pos = next_image_pos + time_steps * height * width
current_pos += next_image_pos + time_steps * height * width
image_index += 1

if current_pos < batch_input_ids.size(1):
Expand Down

0 comments on commit e14221b

Please sign in to comment.