Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DAB-DETR Object detection/segmentation model #30803

Open
wants to merge 94 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 69 commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
8adf1bb
initial commit
conditionedstimulus May 14, 2024
8291122
encoder+decoder layer changes WIP
conditionedstimulus May 16, 2024
09e2516
architecture checks
conditionedstimulus May 21, 2024
8a004cf
working version of detection + segmentation
conditionedstimulus May 24, 2024
defbc43
fix modeling outputs
conditionedstimulus May 25, 2024
5cfbcfc
fix return dict + output att/hs
conditionedstimulus May 26, 2024
6c7564a
found the position embedding masking bug
conditionedstimulus May 27, 2024
35e056f
pre-training version
conditionedstimulus May 28, 2024
24a9d7a
added iamge processors
conditionedstimulus May 29, 2024
d9b7af4
typo in init.py
conditionedstimulus May 29, 2024
a171339
iterupdate set to false
conditionedstimulus May 29, 2024
b8b2201
fixed num_labels in class_output linear layer bias init
conditionedstimulus May 29, 2024
abe0698
multihead attention shape fixes
conditionedstimulus Jun 2, 2024
e60b555
test improvements
conditionedstimulus Jun 10, 2024
6dafb79
test update
conditionedstimulus Jun 11, 2024
5bbdca1
dab-detr model_doc update
conditionedstimulus Jun 12, 2024
4a5ac4f
dab-detr model_doc update2
conditionedstimulus Jun 12, 2024
592796b
test fix:test_retain_grad_hidden_states_attentions
conditionedstimulus Jun 12, 2024
d76fda2
config file clean and renaming variables
conditionedstimulus Jun 17, 2024
ade9720
config file clean and renaming variables fix
conditionedstimulus Jun 17, 2024
6b58e5f
updated convert_to_hf file
conditionedstimulus Jun 17, 2024
eac19f5
small fixes
conditionedstimulus Jun 17, 2024
460e9d6
style and qulity checks
conditionedstimulus Jun 17, 2024
0151f65
Merge branch 'main' into add_dab_detr
conditionedstimulus Jun 17, 2024
97194c7
return_dict fix
conditionedstimulus Jun 20, 2024
3fc56b4
Merge branch main into add_dab_detr
conditionedstimulus Jun 20, 2024
ffbb1dc
Merge branch main into add_dab_detr
conditionedstimulus Jun 20, 2024
a23b173
small comment fix
conditionedstimulus Jun 20, 2024
886087f
skip test_inputs_embeds test
conditionedstimulus Jun 20, 2024
42f469e
image processor updates + image processor test updates
conditionedstimulus Jun 21, 2024
52d1aea
check copies test fix update
conditionedstimulus Jun 21, 2024
7f0ada9
updates for check_copies.py test
conditionedstimulus Jun 21, 2024
28f30aa
updates for check_copies.py test2
conditionedstimulus Jun 21, 2024
b3713d1
tied weights fix
conditionedstimulus Jun 24, 2024
731d0ae
fixed image processing tests and fixed shared weights issues
conditionedstimulus Jun 25, 2024
ae43a4a
Merge branch 'main' into add_dab_detr
conditionedstimulus Jun 25, 2024
f952fd6
added numpy nd array option to get_Expected_values method in test_ima…
conditionedstimulus Jun 25, 2024
6e3af24
delete prints from test file
conditionedstimulus Jun 25, 2024
baa9af7
SafeTensor modification to solve HF Trainer issue
conditionedstimulus Jun 25, 2024
7de850d
removing the safetensor modifications
conditionedstimulus Jun 25, 2024
17ae1c4
make fix copies and hf uplaod has been added.
conditionedstimulus Jul 10, 2024
56f0846
Merge branch 'main' into add_dab_detr
conditionedstimulus Jul 10, 2024
c13a096
fixed index.md
conditionedstimulus Jul 10, 2024
d7e9e22
fixed repo consistency
conditionedstimulus Jul 10, 2024
8bf75c8
styel fix and dabdetrimageprocessor docstring update
conditionedstimulus Jul 10, 2024
b09f996
requested modifications after the first review
conditionedstimulus Jul 29, 2024
8ae2e1b
Update src/transformers/models/dab_detr/image_processing_dab_detr.py
conditionedstimulus Jul 29, 2024
7ba65b1
repo consistency has been fixed
conditionedstimulus Jul 29, 2024
78cedb4
Merge branch 'main' into add_dab_detr
conditionedstimulus Jul 30, 2024
2b37103
update copied NestedTensor function after main merge
conditionedstimulus Jul 30, 2024
8870773
Update src/transformers/models/dab_detr/modeling_dab_detr.py
conditionedstimulus Aug 2, 2024
a402d0d
temp commit
conditionedstimulus Aug 3, 2024
c4bd33d
temp commit2
conditionedstimulus Aug 5, 2024
973db0c
temp commit 3
conditionedstimulus Aug 7, 2024
adebdc1
Merge branch 'main' into add_dab_detr
conditionedstimulus Aug 7, 2024
75a780c
unit tests are fixed
conditionedstimulus Aug 7, 2024
ee7e11b
fixed repo consistency
conditionedstimulus Aug 7, 2024
738a693
updated expected_boxes varible values based on related notebook resul…
conditionedstimulus Aug 8, 2024
01c7702
Merge branch 'main' into add_dab_detr
conditionedstimulus Aug 26, 2024
ce549c5
temporarialy config modifications and repo consistency fixes
conditionedstimulus Aug 26, 2024
38f91f1
Put dilation parameter back to config
conditionedstimulus Sep 10, 2024
b28b2a6
pattern embeddings have been added to the rename_keys method
conditionedstimulus Sep 10, 2024
1dcd978
add dilation comment to config + add as an exception in check_config_…
conditionedstimulus Sep 29, 2024
46eb24c
Merge branch 'main' into add_dab_detr
conditionedstimulus Sep 29, 2024
13af19b
delete FeatureExtractor part from docs.md
conditionedstimulus Sep 29, 2024
b3bf25e
requested modifications in modeling_dab_detr.py
conditionedstimulus Oct 3, 2024
b76a73a
[run_slow] dab_detr
conditionedstimulus Oct 3, 2024
638f8f5
deleted last segmentation code part, updated conversion script and ch…
conditionedstimulus Oct 5, 2024
9d5dafd
Merge branch 'main' into add_dab_detr
conditionedstimulus Oct 5, 2024
049b625
temp commit of requested modifications
conditionedstimulus Oct 12, 2024
6b0fc91
temp commit of requested modifications 2
conditionedstimulus Oct 12, 2024
7f2e2e2
updated config file, resolved codepaths and refactored conversion script
conditionedstimulus Oct 13, 2024
fac9ee9
updated decodelayer block types and refactored conversion script
conditionedstimulus Oct 14, 2024
78004d0
style and quality update
conditionedstimulus Oct 14, 2024
0bf9e3b
Merge branch 'main' into add_dab_detr
conditionedstimulus Oct 14, 2024
95d7a71
small modifications based on the request
conditionedstimulus Oct 28, 2024
2663c26
attentions are refactored
conditionedstimulus Oct 31, 2024
724e767
Merge branch 'main' into add_dab_detr
conditionedstimulus Nov 1, 2024
04d3e31
removed loss functions from modeling file, added loss function to los…
conditionedstimulus Nov 1, 2024
0122e62
deleted imageprocessor
conditionedstimulus Nov 3, 2024
53e2bd2
fixed conversion script + quality and style
conditionedstimulus Nov 3, 2024
4fd9bfc
fixed config_att
conditionedstimulus Nov 3, 2024
e32cf92
Merge branch 'main' into add_dab_detr
conditionedstimulus Nov 3, 2024
9345341
[run_slow] dab_detr
conditionedstimulus Nov 3, 2024
3ef47cf
changing model path in conversion file and in test file
conditionedstimulus Nov 3, 2024
dc9f359
fix Decoder variable naming
conditionedstimulus Nov 5, 2024
93ec65e
testing the old loss function
conditionedstimulus Nov 6, 2024
c73c0fa
switched back to the new loss function and testing with the odl atten…
conditionedstimulus Nov 6, 2024
e69545d
switched back to the new last good result modeling file
conditionedstimulus Nov 6, 2024
61c5189
moved back to the version when I asked the review
conditionedstimulus Nov 6, 2024
a310f6a
missing new line at the end of the file
conditionedstimulus Nov 6, 2024
464ac93
Merge branch 'main' into add_dab_detr
conditionedstimulus Dec 21, 2024
fc0ced6
old version test
conditionedstimulus Dec 21, 2024
7bf5267
turn back to newest mdoel versino but change image processor
conditionedstimulus Dec 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -617,6 +617,8 @@
title: ConvNeXTV2
- local: model_doc/cvt
title: CvT
- local: model_doc/dab-detr
title: DAB-DETR
- local: model_doc/deformable_detr
title: Deformable DETR
- local: model_doc/deit
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ Flax), PyTorch, and/or TensorFlow.
| [CPM-Ant](model_doc/cpmant) | ✅ | ❌ | ❌ |
| [CTRL](model_doc/ctrl) | ✅ | ✅ | ❌ |
| [CvT](model_doc/cvt) | ✅ | ✅ | ❌ |
| [DAB-DETR](model_doc/dab-detr) | ✅ | ❌ | ❌ |
| [DAC](model_doc/dac) | ✅ | ❌ | ❌ |
| [Data2VecAudio](model_doc/data2vec) | ✅ | ❌ | ❌ |
| [Data2VecText](model_doc/data2vec) | ✅ | ❌ | ❌ |
Expand Down
86 changes: 86 additions & 0 deletions docs/source/en/model_doc/dab-detr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# DAB-DETR

## Overview

The DAB-DETR model was proposed in [DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR](https://arxiv.org/abs/2201.12329) by Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang.
DAB-DETR is an enhanced variant of Conditional DETR. It utilizes dynamically updated anchor boxes to provide both a reference query point (x, y) and a reference anchor size (w, h), improving cross-attention computation. This new approach achieves 45.7% AP when trained for 50 epochs with a single ResNet-50 model as the backbone.

<img src="https://github.com/conditionedstimulus/hf_media/blob/main/dab_detr_convergence_plot.png"
alt="drawing" width="600"/>

The abstract from the paper is the following:

*We present in this paper a novel query formulation using dynamic anchor boxes
for DETR (DEtection TRansformer) and offer a deeper understanding of the role
of queries in DETR. This new formulation directly uses box coordinates as queries
in Transformer decoders and dynamically updates them layer-by-layer. Using box
coordinates not only helps using explicit positional priors to improve the query-to-feature similarity and eliminate the slow training convergence issue in DETR,
but also allows us to modulate the positional attention map using the box width
and height information. Such a design makes it clear that queries in DETR can be
implemented as performing soft ROI pooling layer-by-layer in a cascade manner.
As a result, it leads to the best performance on MS-COCO benchmark among
the DETR-like detection models under the same setting, e.g., AP 45.7% using
ResNet50-DC5 as backbone trained in 50 epochs. We also conducted extensive
experiments to confirm our analysis and verify the effectiveness of our methods.*

This model was contributed by [davidhajdu](https://huggingface.co/davidhajdu).
The original code can be found [here](https://github.com/IDEA-Research/DAB-DETR).

There are three ways to instantiate a DAB-DETR model (depending on what you prefer):

Option 1: Instantiate DAB-DETR with pre-trained weights for entire model
```py
>>> from transformers import DABDETRForObjectDetection

>>> model = DABDETRForObjectDetection.from_pretrained("IDEA-Research/dab_detr_resnet50")
```

Option 2: Instantiate DAB-DETR with randomly initialized weights for Transformer, but pre-trained weights for backbone
```py
>>> from transformers import DABDETRConfig, DABDETRForObjectDetection

>>> config = DABDETRConfig()
>>> model = DABDETRForObjectDetection(config)
```
Option 3: Instantiate DAB-DETR with randomly initialized weights for backbone + Transformer
```py
>>> config = DABDETRConfig(use_pretrained_backbone=False)
>>> model = DABDETRForObjectDetection(config)
```


conditionedstimulus marked this conversation as resolved.
Show resolved Hide resolved
## DABDETRConfig

[[autodoc]] DABDETRConfig

## DABDETRImageProcessor

[[autodoc]] DABDETRImageProcessor
- preprocess
- post_process_object_detection

## DABDETRModel

[[autodoc]] DABDETRModel
- forward

## DABDETRForObjectDetection

[[autodoc]] DABDETRForObjectDetection
- forward
18 changes: 18 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,7 @@
"CTRLTokenizer",
],
"models.cvt": ["CvtConfig"],
"models.dab_detr": ["DABDETRConfig"],
"models.dac": ["DacConfig", "DacFeatureExtractor"],
"models.data2vec": [
"Data2VecAudioConfig",
Expand Down Expand Up @@ -1176,6 +1177,7 @@
["ConditionalDetrFeatureExtractor", "ConditionalDetrImageProcessor"]
)
_import_structure["models.convnext"].extend(["ConvNextFeatureExtractor", "ConvNextImageProcessor"])
_import_structure["models.dab_detr"].extend(["DABDETRImageProcessor"])
_import_structure["models.deformable_detr"].extend(
["DeformableDetrFeatureExtractor", "DeformableDetrImageProcessor"]
)
Expand Down Expand Up @@ -1795,6 +1797,13 @@
"CvtPreTrainedModel",
]
)
_import_structure["models.dab_detr"].extend(
[
"DABDETRForObjectDetection",
"DABDETRModel",
"DABDETRPreTrainedModel",
]
)
_import_structure["models.dac"].extend(
[
"DacModel",
Expand Down Expand Up @@ -5129,6 +5138,9 @@
CTRLTokenizer,
)
from .models.cvt import CvtConfig
from .models.dab_detr import (
DABDETRConfig,
)
from .models.dac import (
DacConfig,
DacFeatureExtractor,
Expand Down Expand Up @@ -6045,6 +6057,7 @@
ConditionalDetrImageProcessor,
)
from .models.convnext import ConvNextFeatureExtractor, ConvNextImageProcessor
from .models.dab_detr import DABDETRImageProcessor
from .models.deformable_detr import (
DeformableDetrFeatureExtractor,
DeformableDetrImageProcessor,
Expand Down Expand Up @@ -6596,6 +6609,11 @@
CvtModel,
CvtPreTrainedModel,
)
from .models.dab_detr import (
DABDETRForObjectDetection,
DABDETRModel,
DABDETRPreTrainedModel,
)
from .models.dac import (
DacModel,
DacPreTrainedModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/activations.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@ def __getitem__(self, key):
"silu": nn.SiLU,
"swish": nn.SiLU,
"tanh": nn.Tanh,
"prelu": nn.PReLU,
}
ACT2FN = ClassInstantier(ACT2CLS)

Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
cpmant,
ctrl,
cvt,
dab_detr,
dac,
data2vec,
dbrx,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@
("cpmant", "CpmAntConfig"),
("ctrl", "CTRLConfig"),
("cvt", "CvtConfig"),
("dab-detr", "DABDETRConfig"),
("dac", "DacConfig"),
("data2vec-audio", "Data2VecAudioConfig"),
("data2vec-text", "Data2VecTextConfig"),
Expand Down Expand Up @@ -369,6 +370,7 @@
("cpmant", "CPM-Ant"),
("ctrl", "CTRL"),
("cvt", "CvT"),
("dab-detr", "DAB-DETR"),
("dac", "DAC"),
("data2vec-audio", "Data2VecAudio"),
("data2vec-text", "Data2VecText"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/image_processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
("convnext", ("ConvNextImageProcessor",)),
("convnextv2", ("ConvNextImageProcessor",)),
("cvt", ("ConvNextImageProcessor",)),
("dab-detr", "DABDETRImageProcessor"),
("data2vec-vision", ("BeitImageProcessor",)),
("deformable_detr", ("DeformableDetrImageProcessor",)),
("deit", ("DeiTImageProcessor",)),
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@
("cpmant", "CpmAntModel"),
("ctrl", "CTRLModel"),
("cvt", "CvtModel"),
("dab-detr", "DABDETRModel"),
("dac", "DacModel"),
("data2vec-audio", "Data2VecAudioModel"),
("data2vec-text", "Data2VecTextModel"),
Expand Down Expand Up @@ -559,6 +560,7 @@
("conditional_detr", "ConditionalDetrModel"),
("convnext", "ConvNextModel"),
("convnextv2", "ConvNextV2Model"),
("dab-detr", "DABDETRModel"),
("data2vec-vision", "Data2VecVisionModel"),
("deformable_detr", "DeformableDetrModel"),
("deit", "DeiTModel"),
Expand Down Expand Up @@ -812,6 +814,7 @@
[
# Model for Object Detection mapping
("conditional_detr", "ConditionalDetrForObjectDetection"),
("dab-detr", "DABDETRForObjectDetection"),
("deformable_detr", "DeformableDetrForObjectDetection"),
("deta", "DetaForObjectDetection"),
("detr", "DetrForObjectDetection"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ class ConditionalDetrConfig(PretrainedConfig):
Number of object queries, i.e. detection slots. This is the maximal number of objects
[`ConditionalDetrModel`] can detect in a single image. For COCO, we recommend 100 queries.
d_model (`int`, *optional*, defaults to 256):
Dimension of the layers.
This parameter is a general dimension parameter, defining dimensions for components such as the encoder layer and projection parameters in the decoder layer, among others.
encoder_layers (`int`, *optional*, defaults to 6):
Number of encoder layers.
decoder_layers (`int`, *optional*, defaults to 6):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ class ConditionalDetrDecoderOutput(BaseModelOutputWithCrossAttentions):
intermediate_hidden_states (`torch.FloatTensor` of shape `(config.decoder_layers, batch_size, num_queries, hidden_size)`, *optional*, returned when `config.auxiliary_loss=True`):
Intermediate decoder activations, i.e. the output of each decoder layer, each of them gone through a
layernorm.
reference_points (`torch.FloatTensor` of shape `(config.decoder_layers, batch_size, num_queries, 2 (anchor points))`):
Reference points (reference points of each layer of the decoder).
"""

intermediate_hidden_states: Optional[torch.FloatTensor] = None
Expand Down Expand Up @@ -128,6 +130,8 @@ class ConditionalDetrModelOutput(Seq2SeqModelOutput):
intermediate_hidden_states (`torch.FloatTensor` of shape `(config.decoder_layers, batch_size, sequence_length, hidden_size)`, *optional*, returned when `config.auxiliary_loss=True`):
Intermediate decoder activations, i.e. the output of each decoder layer, each of them gone through a
layernorm.
reference_points (`torch.FloatTensor` of shape `(config.decoder_layers, batch_size, num_queries, 2 (anchor points))`):
Reference points (reference points of each layer of the decoder).
"""

intermediate_hidden_states: Optional[torch.FloatTensor] = None
Expand Down
78 changes: 78 additions & 0 deletions src/transformers/models/dab_detr/__init__.py
conditionedstimulus marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from typing import TYPE_CHECKING

from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available


_import_structure = {
"configuration_dab_detr": [
"DABDETRConfig",
"DABDETROnnxConfig",
]
}

try:
if not is_vision_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["image_processing_dab_detr"] = ["DABDETRImageProcessor"]


try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_dab_detr"] = [
"DABDETRForObjectDetection",
"DABDETRModel",
"DABDETRPreTrainedModel",
]


if TYPE_CHECKING:
from .configuration_dab_detr import (
DABDETRConfig,
DABDETROnnxConfig,
)

try:
if not is_vision_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .image_processing_dab_detr import DABDETRImageProcessor

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_dab_detr import (
DABDETRForObjectDetection,
DABDETRModel,
DABDETRPreTrainedModel,
)

else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
Loading
Loading