Fix position encodings for Pixtral #2678

ameroyer · 2024-12-23T10:45:48Z

For Pixtral models, the position encodings are resampled on-the-fly when dealing with images which are smaller than the default model's config (1024x1024px images, or 64x64 patches of size 16px x 16px). Doing so seems to slightly improve generations (in particular when it comes to the number of objects).

link to ref in Huggingface code
testing the fix : mainly looking at the model outputs for different image size

LaurentMazare · 2024-12-23T10:51:58Z

Thanks, you might want to look at the clippy failure in the CI :)

ameroyer · 2024-12-23T11:38:42Z

Indeed, clippy again, thanks it's fixed now !

Added a minor fix: The default hidden activation for the vision config has been corrected from GeLU to SiLU in the HF repo

LaurentMazare · 2024-12-23T12:22:43Z

Thanks!

ameroyer added 2 commits December 23, 2024 10:42

init commit: add position id in meshgrid

bfa3789

pass in subsampled positions

af11cf8

ameroyer added 2 commits December 23, 2024 12:28

clippy fix

732994b

clippy fix

583b68a

minor fix: Gelu -> Silu to match update in Pixtral's HF repo

a23f6ac

LaurentMazare merged commit 1be6b09 into huggingface:main Dec 23, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix position encodings for Pixtral #2678

Fix position encodings for Pixtral #2678

ameroyer commented Dec 23, 2024

LaurentMazare commented Dec 23, 2024

ameroyer commented Dec 23, 2024 •

edited

Loading

LaurentMazare commented Dec 23, 2024

Fix position encodings for Pixtral #2678

Fix position encodings for Pixtral #2678

Conversation

ameroyer commented Dec 23, 2024

LaurentMazare commented Dec 23, 2024

ameroyer commented Dec 23, 2024 • edited Loading

LaurentMazare commented Dec 23, 2024

ameroyer commented Dec 23, 2024 •

edited

Loading