-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ColPali to 🤗 transformers #33736
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for working on this! Excited to see ColPali in Transformers 🤗.
Main comment for now is to try and inherit directly from PaliGemmaForConditionalGeneration
to make full use of modular and to avoid instantiating PaliGemmaForConditionalGeneration
inside the model
e0db84b
to
9c742c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like something is going wrong with the files generated by Modular, but otherwise looks good!
For the processor and modeling tests, you can take a look at other model tests to see how they should be implemented
08caa9d
to
e189393
Compare
f71f94a
to
f9895a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, apart from some nits and the ProcessingKwargs
issue, it looks almost ready to go for me!
src/transformers/models/colpali/convert_colpali_weights_to_hf.py
Outdated
Show resolved
Hide resolved
Deps used for the weight conversion (
|
Hi @ArthurZucker, could you review my PR for ColPali please? 😁 @yonigozlan was kind enough to do multiple reviews of the PR already, so it should almost ready to ship! However, I noticed a few bugs with modular that will need to be fixed before merging:
Any chance you could use with these please? 🙏🏼 |
This was fixed thanks to #3448! 👍🏼 |
Will review today! Sorry for the delay! |
28e35bd
to
9f34d80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think we need to just use the paligemma config, manually add the embedding dimension or use something PreTrainedConfig.num_labels
else: | ||
# In case no valid text config is found, we might have a model with a vlm backbone | ||
if hasattr(self, "vlm_config"): | ||
for text_config_name in possible_text_config_names: | ||
if hasattr(self.vlm_config, text_config_name): | ||
text_config = getattr(self.vlm_config, text_config_name, None) | ||
if text_config is not None: | ||
valid_text_config_names += [text_config_name] | ||
if len(valid_text_config_names) == 1: | ||
return getattr(self.vlm_config, valid_text_config_names[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense to flatten whatever we can rather than introducing this! 🤗 You don't even need a new config class for ColPaliGemma, you can use the PaliGemma one!
Almost there 🤗 with the config update we should be good to merge |
Is the last thing to do |
@ArthurZucker Fixed the issue with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing through 🔥 Let's merge!
What does this PR do?
Add ColPali support in 🤗
transformers
.Who can review?
@yonigozlan 😉
@ArthurZucker
Additional details
colpali-engine==v0.3.0
.colpali-engine
dependency, the weights are directly loaded from an exportedstate_dict
stored invidore/colpali-v1.2-merged-state_dict
.vidore/colpali-v1.2-hf
.vidore/document-visual-retrieval-test
. I believe this should be moved tohf-internal-testing/document-visual-retrieval-test
.Progress checklist
TODO
hf-internal-testing/document-visual-retrieval-test
(done → [url])Wait for Add support for modifying Processor Kwargs in modular #34477 to be merged (fix for modular script)Large modular logic refactoring #34487 has been merged and fixes this problem