Large modular logic refactoring #34487

Cyrilvallez · 2024-10-29T10:59:52Z

What does this PR do?

This PR largely rework the logic we use in the modular converter. It is (hopefully) clearer and maintainable. Instead of going in all directions, adding stuff, then deleting it if not needed, we now do the following:

visit all the modular file (record imports/functions/classes/assignments nodes)
- create function dependency mapping
for each import coming from another model:
- visit the corresponding file
- create function dependency mapping
- update mapping with function/assignment from the modular (updated/new functions)
- create the class dependency graph based on merged dependencies
update dependency graph of the modular with the functions and assignments imported from the other files
for each class recorded in the modular:
- if inherithing from class in another file:
  - replace call to super
  - find the dependencies after the node was replaced
  - follow (updated with modular defs) dependency mapping to add all nodes
- else:
  - only add needed imported functions (and their dependencies)
determine the needed imports and add them

Note that we now only visit each files once, instead of potentially revisiting them multiple times due to renaming or deleting nodes at the end.
cc @ArthurZucker for the logic design

Still yet to come if the design looks good:

Unit-tests!!!

HuggingFaceDocBuilderDev · 2024-10-29T11:26:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Okay! Very nice idea to have a general ModelMapper, and have one for Modular files and one for "other" files.

Let's keep splitting the functionalities, I think creating the module should go outside the visiter, don't necessarily need a class !

Looks good in general! Great work 🔥

utils/modular_model_converter.py

yonigozlan · 2024-10-29T16:52:39Z

Hey, so nice to see modular getting better and better! Just a heads up that I have this relatively short PR out on modular as well #34477 , which adds some functionalities needed for the ColPali PR, so I was wondering if it will be compatible with this refactoring!
Also made a comment there on adding the modular examples to the check_modular_conversion script to traced when some issues are introduced, I think it could be useful to have here also :)

Cyrilvallez · 2024-10-29T19:29:56Z

Hey @yonigozlan thanks for the heads-up! I added the new TYPE_TO_FILE_TYPE and slightly tweaked how I was handling Annotations, your modular_new_kwargs_model.py now behaves correctly also in this PR.
If you have any issue let me know.

yonigozlan · 2024-10-29T19:44:13Z

@Cyrilvallez Nice! Thank you!

ArthurZucker

Very very nice!
The only things missing now:

unittests / small examples of the capabilities
update the documentation a little bit to further explain how we do this but mostly
To be honest we can merge this will unblock me as well 🤗

utils/modular_model_converter.py

ArthurZucker · 2024-10-30T06:37:23Z

utils/modular_model_converter.py

+    "ProcessorKwargs": "processing",
+    "ImagesKwargs": "processing",
+    "TextKwargs": "processing",


Suggested change

"ProcessorKwargs": "processing",

"ImagesKwargs": "processing",

"TextKwargs": "processing",

MMMM I am not sure about these, we should infer their destination from AlignProcessor that uses them in the signature

Are we expecting clashes in the names here? Because infering based on the class using them is not so straightforward since different classes in different file types may use them (e.g. type hints)

For what I can see, all of them are only used in "processing" files as of now at least

utils/modular_model_converter.py

yonigozlan · 2024-10-30T18:13:14Z

Hey again @Cyrilvallez @ArthurZucker!
It seems like with this new modular converter, several examples that used to work in examples/modular have now some issues. Some of these examples have functionalities that we would need for the ColPali PR, so it would be great if we could try to keep supporting them.
Also should we add the modular examples to the default call of check_modular_conversion? As otherwise, the modular examples aren't automatically updated when changes are made to modular_model_converter, which makes it hard to trace when issues are introduced

Cyrilvallez · 2024-10-31T08:16:54Z

Hey @yonigozlan! Thanks for the feedback! I added assignment dependency tracking to correctly follow everything (I previously thought it would never be needed, but it turns out it is in some niche usecases! And good to have it in general anyway).
Everything is now behaving nicely, both in the actual models definitions and examples (the examples were broken already btw). Could you double-check just in case? This highlights that we really need those tests soon!
I'll let @ArthurZucker decide if we want to add the examples in the check_conversion, but I added them to the --files_to_parse all anyway.

Cyrilvallez · 2024-10-31T16:25:06Z

Added support for a new important use-case (see [WIP] Emu3: add model #33770): correct dispatch of fully new class. Consider the following modular.py:

    from ..llama.modeling_llama import LlamaModel

    class NewNameTextConfig(PretrainedConfig):
        ...

    class NewNameConfig(PretrainedConfig):
        ...

    class NewNameModel(LlamaModel):
        config = NewNameConfig()
        text_config = NewNameTextConfig()
        ...

we previously had no way of correctly dispatching NewNameTextConfig to configuration_newname.py, without importing it in modeling_newname.py as well as part of the dependencies (because modeling_llama.py only tell us that NewNameConfig will be imported, but nothing about TextConfig).
This is now solved, every fully new class (without exact name match e.g. LlamaConfig <-> NewNameConfig) which does not belong to the correct file type will not be added, and an import will be created instead.

ArthurZucker · 2024-11-01T09:16:33Z

🥳

* rework converter * Update modular_model_converter.py * Update modular_model_converter.py * Update modular_model_converter.py * Update modular_model_converter.py * cleaning * cleaning * finalize imports * imports * Update modular_model_converter.py * Better renaming to avoid visiting same file multiple times * start converting files * style * address most comments * style * remove unused stuff in get_needed_imports * style * move class dependency functions outside class * Move main functions outside class * style * Update modular_model_converter.py * rename func * add augmented dependencies * Update modular_model_converter.py * Add types_to_file_type + tweak annotation handling * Allow assignment dependency mapping + fix regex * style + update modular examples * fix modular_roberta example (wrong redefinition of __init__) * slightly correct order in which dependencies will appear * style * review comments * Performance + better handling of dependencies when they are imported * style * Add advanced new classes capabilities * style * add forgotten check * Update modeling_llava_next_video.py * Add prority list ordering in check_conversion as well * Update check_modular_conversion.py * Update configuration_gemma.py

ArthurZucker reviewed Oct 29, 2024

View reviewed changes

ArthurZucker approved these changes Oct 30, 2024

View reviewed changes

Cyrilvallez mentioned this pull request Oct 30, 2024

Modular phi #34361

Open

5 tasks

Cyrilvallez mentioned this pull request Oct 30, 2024

[WIP] Emu3: add model #33770

Merged

5 tasks

Cyrilvallez added 19 commits October 31, 2024 17:29

rework converter

2c8b87c

Update modular_model_converter.py

31809d1

Update modular_model_converter.py

b7acc35

Update modular_model_converter.py

2ca25c2

Update modular_model_converter.py

aaee9ae

cleaning

2d26196

cleaning

2c675f2

finalize imports

8f3b764

imports

1084ca7

Update modular_model_converter.py

39a0a89

Better renaming to avoid visiting same file multiple times

3ba751a

start converting files

7416080

style

4545b63

address most comments

5958f64

style

cfdafe3

remove unused stuff in get_needed_imports

bc7e20b

style

2ab7f56

move class dependency functions outside class

197d937

Move main functions outside class

459be8f

Cyrilvallez added 18 commits October 31, 2024 17:30

style

128986d

Update modular_model_converter.py

79113cf

rename func

8d26fa9

add augmented dependencies

b250367

Update modular_model_converter.py

33dbde7

Add types_to_file_type + tweak annotation handling

9fcddb8

Allow assignment dependency mapping + fix regex

70f006b

style + update modular examples

b5879b1

fix modular_roberta example (wrong redefinition of __init__)

efdbe78

slightly correct order in which dependencies will appear

e8fe360

style

dea43c8

review comments

9a8a7e0

Performance + better handling of dependencies when they are imported

38a574a

style

0b7c103

Add advanced new classes capabilities

dde85dc

style

7cd1eff

add forgotten check

cc58d43

Update modeling_llava_next_video.py

f05849a

Cyrilvallez force-pushed the new-modular branch from 79c5d85 to f05849a Compare October 31, 2024 16:35

Cyrilvallez added 3 commits October 31, 2024 18:15

Add prority list ordering in check_conversion as well

be70f7d

Update check_modular_conversion.py

c8a4d4d

Update configuration_gemma.py

cfec75d

Cyrilvallez merged commit e2ac16b into main Nov 1, 2024
18 checks passed

Cyrilvallez deleted the new-modular branch November 1, 2024 09:13

tonywu71 mentioned this pull request Nov 1, 2024

Add ColPali to 🤗 transformers #33736

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large modular logic refactoring #34487

Large modular logic refactoring #34487

Cyrilvallez commented Oct 29, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 29, 2024

ArthurZucker left a comment

yonigozlan commented Oct 29, 2024

Cyrilvallez commented Oct 29, 2024

yonigozlan commented Oct 29, 2024

ArthurZucker left a comment

ArthurZucker Oct 30, 2024

Cyrilvallez Oct 31, 2024

Cyrilvallez Oct 31, 2024

yonigozlan commented Oct 30, 2024

Cyrilvallez commented Oct 31, 2024 •

edited

Loading

Cyrilvallez commented Oct 31, 2024

ArthurZucker commented Nov 1, 2024

	"ProcessorKwargs": "processing",
	"ImagesKwargs": "processing",
	"TextKwargs": "processing",

Large modular logic refactoring #34487

Large modular logic refactoring #34487

Conversation

Cyrilvallez commented Oct 29, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Oct 29, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

yonigozlan commented Oct 29, 2024

Cyrilvallez commented Oct 29, 2024

yonigozlan commented Oct 29, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Oct 30, 2024

Choose a reason for hiding this comment

Cyrilvallez Oct 31, 2024

Choose a reason for hiding this comment

Cyrilvallez Oct 31, 2024

Choose a reason for hiding this comment

yonigozlan commented Oct 30, 2024

Cyrilvallez commented Oct 31, 2024 • edited Loading

Cyrilvallez commented Oct 31, 2024

ArthurZucker commented Nov 1, 2024

Cyrilvallez commented Oct 29, 2024 •

edited

Loading

Cyrilvallez commented Oct 31, 2024 •

edited

Loading