Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ColPali to 🤗 transformers #33736

Merged
merged 137 commits into from
Dec 17, 2024
Merged

Conversation

tonywu71
Copy link
Contributor

@tonywu71 tonywu71 commented Sep 26, 2024

What does this PR do?

Add ColPali support in 🤗 transformers.

Who can review?

@yonigozlan 😉
@ArthurZucker

Additional details

  • This PR uses the new Modular 🤗 transformers feature from v4.45.0
  • The ColPali is mainly inspired from the colpali-engine repository I'm maintaining with my co-authors. The initial code was taken from colpali-engine==v0.3.0.
  • To prevent adding the colpali-engine dependency, the weights are directly loaded from an exported state_dict stored in vidore/colpali-v1.2-merged-state_dict.
  • The newly converted model weights are stored in vidore/colpali-v1.2-hf.
  • I have contributed a small dataset for integration testing for visual retrievers: vidore/document-visual-retrieval-test. I believe this should be moved to hf-internal-testing/document-visual-retrieval-test.
  • I had a lot of trouble with the weight conversion part. Turns out there was a reproducibility issue depending on the version of torch. For reproducibility, I've added the freezed deps below in the PR.

Progress checklist

TODO

  • (Optional) Understood the model’s theoretical aspects
  • Prepared 🤗 Transformers dev environment
  • Set up debugging environment of the original repository
  • Created script that successfully runs the forward() pass using the original repository and checkpoint
  • Successfully added the model skeleton to 🤗 Transformers
  • Successfully converted original checkpoint to 🤗 Transformers checkpoint
  • Successfully ran forward() pass in 🤗 Transformers that gives identical output to original checkpoint
  • Finished model tests in 🤗 Transformers
  • Successfully added tokenizer in 🤗 Transformers
  • Run end-to-end integration tests
  • Finished docs
  • Uploaded model weights to the Hub
  • Submitted the pull request
  • (Optional) Added a demo notebook → can be found in https://github.com/tonywu71/colpali-cookbooks
  • Update tests if we decide to migrate the custom test dataset to hf-internal-testing/document-visual-retrieval-test (done → [url])
  • Wait for Add support for modifying Processor Kwargs in modular #34477 to be merged (fix for modular script) Large modular logic refactoring #34487 has been merged and fixes this problem

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for working on this! Excited to see ColPali in Transformers 🤗.
Main comment for now is to try and inherit directly from PaliGemmaForConditionalGeneration to make full use of modular and to avoid instantiating PaliGemmaForConditionalGeneration inside the model

src/transformers/models/auto/configuration_auto.py Outdated Show resolved Hide resolved
src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
src/transformers/models/colpali/processing_colpali.py Outdated Show resolved Hide resolved
src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
@tonywu71 tonywu71 force-pushed the add-colpali branch 3 times, most recently from e0db84b to 9c742c5 Compare September 27, 2024 22:18
Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like something is going wrong with the files generated by Modular, but otherwise looks good!
For the processor and modeling tests, you can take a look at other model tests to see how they should be implemented

src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
tests/models/colpali/test_modeling_colpali.py Outdated Show resolved Hide resolved
tests/models/colpali/test_modeling_colpali.py Outdated Show resolved Hide resolved
tests/models/colpali/test_modeling_colpali.py Outdated Show resolved Hide resolved
src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, apart from some nits and the ProcessingKwargs issue, it looks almost ready to go for me!

src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
tests/models/colpali/test_modeling_colpali.py Show resolved Hide resolved
src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
src/transformers/models/colpali/modular_colpali.py Outdated Show resolved Hide resolved
docs/source/en/index.md Outdated Show resolved Hide resolved
@tonywu71
Copy link
Contributor Author

Deps used for the weight conversion (requirements.txt):

accelerate==1.0.1
aiohappyeyeballs==2.4.3
aiohttp==3.10.10
aiosignal==1.3.1
appnope==0.1.4
asttokens==2.4.1
attrs==24.2.0
certifi==2024.8.30
charset-normalizer==3.4.0
colpali_engine==0.3.3
comm==0.2.2
datasets==3.0.2
debugpy==1.8.7
decorator==5.1.1
dill==0.3.8
executing==2.1.0
filelock==3.16.1
frozenlist==1.5.0
fsspec==2024.9.0
gitdb==4.0.11
GitPython==3.1.18
huggingface-hub==0.26.2
idna==3.10
ipykernel==6.29.5
ipython==8.29.0
isort==5.13.2
jedi==0.19.1
Jinja2==3.1.4
jupyter_client==8.6.3
jupyter_core==5.7.2
libcst==1.5.0
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib-inline==0.1.7
mdurl==0.1.2
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
nest-asyncio==1.6.0
networkx==3.4.2
numpy==2.1.2
packaging==24.1
pandas==2.2.3
parso==0.8.4
pexpect==4.9.0
pillow==11.0.0
platformdirs==4.3.6
prompt_toolkit==3.0.48
propcache==0.2.0
psutil==6.1.0
ptyprocess==0.7.0
pure_eval==0.2.3
pyarrow==18.0.0
Pygments==2.18.0
python-dateutil==2.9.0.post0
pytz==2024.2
PyYAML==6.0.2
pyzmq==26.2.0
regex==2024.9.11
requests==2.32.3
rich==13.9.3
ruff==0.5.1
safetensors==0.4.5
six==1.16.0
smmap==5.0.1
stack-data==0.6.3
sympy==1.13.1
tokenizers==0.20.1
torch==2.5.1
tornado==6.4.1
tqdm==4.66.6
traitlets==5.14.3
-e git+https://github.com/tonywu71/transformers.git@05df69d83bdc5398f590ca0bb623de330cec49dd#egg=transformers
typing_extensions==4.12.2
tzdata==2024.2
urllib3==1.26.20
wcwidth==0.2.13
xxhash==3.5.0
yarl==1.17.1

@tonywu71
Copy link
Contributor Author

Hi @ArthurZucker, could you review my PR for ColPali please? 😁 @yonigozlan was kind enough to do multiple reviews of the PR already, so it should almost ready to ship!

However, I noticed a few bugs with modular that will need to be fixed before merging:

  1. There is the bug shown below where the ColPaliProcessorKwargs are not properly updated in the processing file.
Screenshot 2024-10-31 at 09 06 32
  1. The docstrings seem not to be properly synchronized: see the forward documentation for ColPaliForRetrieval in src/transformers/models/colpali/modular_colpali.py vs src/transformers/models/colpali/modeling_colpali.py.

Any chance you could use with these please? 🙏🏼

@tonywu71
Copy link
Contributor Author

tonywu71 commented Nov 1, 2024

Hi @ArthurZucker, could you review my PR for ColPali please? 😁 @yonigozlan was kind enough to do multiple reviews of the PR already, so it should almost ready to ship!

However, I noticed a few bugs with modular that will need to be fixed before merging:

  1. There is the bug shown below where the ColPaliProcessorKwargs are not properly updated in the processing file.
  2. The docstrings seem not to be properly synchronized: see the forward documentation for ColPaliForRetrieval in src/transformers/models/colpali/modular_colpali.py vs src/transformers/models/colpali/modeling_colpali.py.

Any chance you could use with these please? 🙏🏼

This was fixed thanks to #3448! 👍🏼

@ArthurZucker
Copy link
Collaborator

Will review today! Sorry for the delay!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think we need to just use the paligemma config, manually add the embedding dimension or use something PreTrainedConfig.num_labels

Comment on lines 1125 to 1134
else:
# In case no valid text config is found, we might have a model with a vlm backbone
if hasattr(self, "vlm_config"):
for text_config_name in possible_text_config_names:
if hasattr(self.vlm_config, text_config_name):
text_config = getattr(self.vlm_config, text_config_name, None)
if text_config is not None:
valid_text_config_names += [text_config_name]
if len(valid_text_config_names) == 1:
return getattr(self.vlm_config, valid_text_config_names[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to flatten whatever we can rather than introducing this! 🤗 You don't even need a new config class for ColPaliGemma, you can use the PaliGemma one!

@ArthurZucker
Copy link
Collaborator

Almost there 🤗 with the config update we should be good to merge

@ArthurZucker
Copy link
Collaborator

Mmm okay, but in that case let's not have a very specifc change. vlm_config is just for paligemma. We just need a change in the ColPaliGemmaConfig not in modeling utils

Is the last thing to do

@yonigozlan
Copy link
Member

@ArthurZucker Fixed the issue with text_config and the slow tests ran successfully 🤗

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing through 🔥 Let's merge!

@ArthurZucker ArthurZucker merged commit f33a0ce into huggingface:main Dec 17, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants