New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add DAB-DETR Object detection/segmentation model #30803

Open

conditionedstimulus wants to merge 94 commits into huggingface:main from conditionedstimulus:add_dab_detr

Contributor

conditionedstimulus commented May 14, 2024

What does this PR do?

Add DAB-DETR Object detection model. Paper: https://arxiv.org/abs/2201.12329
Original code repo: https://github.com/IDEA-Research/DAB-DETR

Fixes # (issue)
[WIP] This model is part of how DETR models have evolved, alongside DN DETR (not part of this PR), to pave the way for newer and better models like Dino and Stable Dino in object detection

Who can review?


          initial commit

8adf1bb

Collaborator

amyeroberts commented May 15, 2024

Hi @conditionedstimulus, thanks for opening a PR!

Just skimming over the modeling files, it looks like all of the modules are copied from, or can be copied from conditional DETR. Are there any architectural changes this model brings? If not, then all we need to do is convert the checkpoints and upload those to the hub such that they can be loaded in ConditionalDETR directly

Contributor Author

conditionedstimulus commented May 15, 2024

Hi @conditionedstimulus, thanks for opening a PR!

Just skimming over the modeling files, it looks like all of the modules are copied from, or can be copied from conditional DETR. Are there any architectural changes this model brings? If not, then all we need to do is convert the checkpoints and upload those to the hub such that they can be loaded in ConditionalDETR directly

Hi Amy,

I attached a photo comparing the cross-attention of the decoder in DETR, Conditional DETR, and DAB DETR, as this is the main architectural difference. I copied the code from Conditional DETR because this model is an extension/evolved version of Conditional DETR. I believe it would be cool and useful to include this model in the HF object detection collection.


          encoder+decoder layer changes WIP

Collaborator

amyeroberts commented May 17, 2024

@conditionedstimulus Thanks for sharing! OK, seems useful to have this available as an option as part of the DETR family in the library. Feel free to ping me when the PR is ready for review.

cc @qubvel for reference

conditionedstimulus added 25 commits

May 21, 2024 21:48


          architecture checks

09e2516


          working version of detection + segmentation

8a004cf


          fix modeling outputs

defbc43


          fix return dict + output att/hs

5cfbcfc


          found the position embedding masking bug

6c7564a


          pre-training version

35e056f


          added iamge processors

24a9d7a


          typo in init.py

d9b7af4


          iterupdate set to false

a171339


          fixed num_labels in class_output linear layer bias init

b8b2201


          multihead attention shape fixes

abe0698


          test improvements

e60b555


          test update

6dafb79


          dab-detr model_doc update

5bbdca1


          dab-detr model_doc update2

4a5ac4f


          test fix:test_retain_grad_hidden_states_attentions

592796b


          config file clean and renaming variables

d76fda2


          config file clean and renaming variables fix

ade9720


          updated convert_to_hf file

6b58e5f


          small fixes

eac19f5


          style and qulity checks

460e9d6


          Merge branch 'main' into add_dab_detr

0151f65


          return_dict fix

97194c7


          Merge branch main into add_dab_detr

3fc56b4


          Merge branch main into add_dab_detr

ffbb1dc

Collaborator

ArthurZucker commented Oct 15, 2024

are you suggesting that I remove any code paths, functions, or configuration variables not related to the pre-trained version? It seems that most users will likely utilize the pre-trained model rather than training one from scratch.

exactly. If someone want's to add complexity, he can copy past the model or monky patch it, we want to reflect the architecture of the PreTrainedModel as much as possiblew

Collaborator

ArthurZucker commented Oct 15, 2024

For the pretrained checkpoints, if they exist for a certain path we kinda have no choice but to keep it that way! Tho what matters is the intention: we try to remove them, then we try to only have things that change the init and not the forward (ex: create 2 classes instead of 1 with if else) etc 🤗

ArthurZucker reviewed

View reviewed changes

Collaborator

ArthurZucker left a comment

IMO only thing left to do:

refactor the attention to make it as close as Llama/Gemma
small renaming -> d_model is hidden_size etc. Having a look at modeling llama for the correct standards!
move the losses to loss_utils or loss_dab_detr.py

Awesome refactoring otherwise! 🔥

src/transformers/models/dab_detr/configuration_dab_detr.py

+                  model_type = "dab-detr"
+                  keys_to_ignore_at_inference = ["past_key_values"]
+                  attribute_map = {

Collaborator

ArthurZucker Oct 15, 2024

I mean can we remove the properties? 🤗

src/transformers/models/dab_detr/convert_dab_detr_original_pytorch_checkpoint_to_pytorch.py

Collaborator

ArthurZucker Oct 15, 2024

❤️ a lot better thanks!

src/transformers/models/dab_detr/convert_dab_detr_original_pytorch_checkpoint_to_pytorch.py Outdated

Comment on lines 36 to 37

		r"input_proj.weight": r"input_projection.weight",
		r"input_proj.bias": r"input_projection.bias",

Collaborator

ArthurZucker Oct 15, 2024

Suggested change

      
                r"input_proj.weight": r"input_projection.weight",
          
                r"input_proj.bias": r"input_projection.bias",
          
                r"input_proj.(bias|weight)": r"input_projection.\1",

src/transformers/models/dab_detr/convert_dab_detr_original_pytorch_checkpoint_to_pytorch.py Outdated

Comment on lines 39 to 40

		r"class_embed.weight": r"class_embed.weight",
		r"class_embed.bias": r"class_embed.bias",

Collaborator

ArthurZucker Oct 15, 2024

Suggested change

      
                r"class_embed.weight": r"class_embed.weight",
          
                r"class_embed.bias": r"class_embed.bias",
          
                r"class_embed.(bias|weight)": r"class_embed.\1",

src/transformers/models/dab_detr/convert_dab_detr_original_pytorch_checkpoint_to_pytorch.py Outdated

Comment on lines 44 to 56

+                  r"transformer.encoder.query_scale.layers.(\d+).weight": r"encoder.query_scale.layers.\1.weight",
+                  r"transformer.encoder.query_scale.layers.(\d+).bias": r"encoder.query_scale.layers.\1.bias",
+                  r"transformer.decoder.bbox_embed.layers.(\d+).weight": r"decoder.bbox_embed.layers.\1.weight",
+                  r"transformer.decoder.bbox_embed.layers.(\d+).bias": r"decoder.bbox_embed.layers.\1.bias",
+                  r"transformer.decoder.norm.weight": r"decoder.layernorm.weight",
+                  r"transformer.decoder.norm.bias": r"decoder.layernorm.bias",
+                  r"transformer.decoder.ref_point_head.layers.(\d+).weight": r"decoder.ref_point_head.layers.\1.weight",
+                  r"transformer.decoder.ref_point_head.layers.(\d+).bias": r"decoder.ref_point_head.layers.\1.bias",
+                  r"transformer.decoder.ref_anchor_head.layers.(\d+).weight": r"decoder.ref_anchor_head.layers.\1.weight",
+                  r"transformer.decoder.ref_anchor_head.layers.(\d+).bias": r"decoder.ref_anchor_head.layers.\1.bias",
+                  r"transformer.decoder.query_scale.layers.(\d+).weight": r"decoder.query_scale.layers.\1.weight",
+                  r"transformer.decoder.query_scale.layers.(\d+).bias": r"decoder.query_scale.layers.\1.bias",
+                  r"transformer.decoder.layers.0.ca_qpos_proj.weight": r"decoder.layers.0.layer.1.cross_attn_query_pos_proj.weight",

Collaborator

ArthurZucker Oct 15, 2024

same comment about weight and bias for the rest !

src/transformers/models/dab_detr/convert_dab_detr_original_pytorch_checkpoint_to_pytorch.py Outdated

Comment on lines 127 to 128


		def convert_old_keys_to_new_keys(state_dict_keys: dict = None):

Collaborator

ArthurZucker Oct 15, 2024

missing Copied from!

Contributor Author

conditionedstimulus Nov 3, 2024

Updated

src/transformers/models/dab_detr/image_processing_dab_detr.py Outdated

Collaborator

ArthurZucker Oct 15, 2024

if everything is copied from (tell me if I am wrong!) then we can juste directly use image_processing_detr instead!

Contributor Author

conditionedstimulus Nov 3, 2024

You're right! I removed DabDetrImageProcessor and used DabDetr as you suggested—it works perfectly.

src/transformers/models/dab_detr/modeling_dab_detr.py Outdated Show resolved Hide resolved

src/transformers/models/dab_detr/modeling_dab_detr.py

    
                      pos_x = x_embed[:, :, :, None] / dim_tx

                      # We use float32 to ensure reproducibility of the original implementation

                      dim_ty = torch.arange(self.embedding_dim, dtype=torch.float32, device=pixel_values.device)

Collaborator

ArthurZucker Oct 15, 2024

no worries!

src/transformers/models/dab_detr/modeling_dab_detr.py

Comment on lines +955 to +956

    
                      h = [hidden_dim] * (num_layers - 1)

                      self.layers = nn.ModuleList(nn.Linear(n, k) for n, k in zip([input_dim] + h, h + [output_dim]))

Collaborator

ArthurZucker Oct 15, 2024

we should probably create this in the config, we would only have to pass the list of tuples!

Contributor Author

conditionedstimulus Nov 3, 2024

I attempted the change, but the config test failed with the error: "Object of type ModuleList is not JSON serializable." So I left it as it was. If you still want the change, should I go ahead and modify the config tests?


          small modifications based on the request

95d7a71

Collaborator

ArthurZucker commented Oct 29, 2024

Feel free to png me when this is ready for another review

Contributor Author

conditionedstimulus commented Oct 29, 2024

Feel free to png me when this is ready for another review

Hi, thanks! Sorry, I got caught up with other errands and couldn’t work on the code for the past two weeks. I’m back at it now and will ping you once I'm done!

conditionedstimulus added 9 commits

October 31, 2024 12:45


          attentions are refactored

2663c26


          Merge branch 'main' into add_dab_detr

724e767


          removed loss functions from modeling file, added loss function to los…

04d3e31

…sutils, tried to move the MLP layer generation to config but it failed


          deleted imageprocessor

0122e62


          fixed conversion script + quality and style

53e2bd2


          fixed config_att

4fd9bfc


          Merge branch 'main' into add_dab_detr

e32cf92


          [run_slow] dab_detr


          changing model path in conversion file and in test file

3ef47cf

Contributor Author

conditionedstimulus commented Nov 3, 2024 •

edited

Loading

IMO only thing left to do:

refactor the attention to make it as close as Llama/Gemma

small renaming -> d_model is hidden_size etc. Having a look at modeling llama for the correct standards!

move the losses to loss_utils or loss_dab_detr.py

Awesome refactoring otherwise! 🔥

Hi @ArthurZucker,

I could use your review and help here.

One test is failing, but it’s unrelated to my model. (TFResnet assert error)

I refactored the attention mechanism, and it looks solid—I think it’s much closer to mllama now.
I also did some renaming, do you suggest any further changes?
Additionally, I moved the loss function as you recommended.

The main issue is that the model isn't performing well. It barely learns, validation loss hardly decreases, metrics improve very little, and the final results are poor.
I suspect the problem might lie in the loss function or the attention, with the loss function being my main guess.
While I implemented it following the approach in Conditional DETR, I’m not entirely sure it’s correct.

Current fine-tuned model notebook
Previous model notebook when the results were good

Could you help identify where the issue might be?
Thank you!

conditionedstimulus requested a review from ArthurZucker

November 3, 2024 16:05

Contributor Author

conditionedstimulus commented Nov 4, 2024

Loss function seems good too.

conditionedstimulus added 6 commits

November 5, 2024 11:17


          fix Decoder variable naming

dc9f359


          testing the old loss function

93ec65e


          switched back to the new loss function and testing with the odl atten…

c73c0fa

…tion functions


          switched back to the new last good result modeling file

e69545d


          moved back to the version when I asked the review

61c5189


          missing new line at the end of the file

a310f6a

Contributor Author

conditionedstimulus commented Nov 6, 2024 •

edited

Loading

Hi @ArthurZucker,

I tried to pinpoint why the model isn’t learning but haven't identified the issue yet. I’ve tested various model versions, including:

The last modeling file version, which previously produced good results
Old attention functions
The previous loss function
I still can't see where the problem lies. Even running the last modeling version (that performed well, before I made any modifications of this review), the results remain poor. This is odd because I also ran the ConditionalDETR notebook, and that model performs well, so it's not the Trainer function.

Any idea? :)
Thanks for your help in finding the issue!

Collaborator

ArthurZucker commented Nov 19, 2024

Will leave @qubvel take the review! 🤗 in general super hard for us to jump through and debug training at this stage, will see if it helps!

ArthurZucker removed their request for review

November 19, 2024 15:18

conditionedstimulus added 3 commits

December 21, 2024 17:15


          Merge branch 'main' into add_dab_detr

464ac93


          old version test

fc0ced6


          turn back to newest mdoel versino but change image processor

7bf5267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

New model run-slow Vision