Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix position encodings for Pixtral #2678

Merged
merged 5 commits into from
Dec 23, 2024

Conversation

ameroyer
Copy link
Contributor

For Pixtral models, the position encodings are resampled on-the-fly when dealing with images which are smaller than the default model's config (1024x1024px images, or 64x64 patches of size 16px x 16px). Doing so seems to slightly improve generations (in particular when it comes to the number of objects).

  • link to ref in Huggingface code
  • testing the fix : mainly looking at the model outputs for different image size

@LaurentMazare
Copy link
Collaborator

Thanks, you might want to look at the clippy failure in the CI :)

@ameroyer
Copy link
Contributor Author

ameroyer commented Dec 23, 2024

Indeed, clippy again, thanks it's fixed now !

  • Added a minor fix: The default hidden activation for the vision config has been corrected from GeLU to SiLU in the HF repo

@LaurentMazare LaurentMazare merged commit 1be6b09 into huggingface:main Dec 23, 2024
10 checks passed
@LaurentMazare
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants