-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TF swiftformer #23342
Add TF swiftformer #23342
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Hi @joaocmd, |
Hi @Rocketknight1, could I get some pointers as to why I get errors like in most of the tests:
The PyTorch model has this following docstring but I don't see where the input_ids part is being taken care of. """
Here we also overwrite some of the tests of test_modeling_common.py, as SwiftFormer does not use input_ids, inputs_embeds,
attention_mask and seq_length.
""" Thanks! |
It seems like it is entering in the if "args" in output:
if output["args"] is not None and is_tf_symbolic_tensor(output["args"]):
tensor_name = output["args"].name.split(":")[0]
output[tensor_name] = output["args"]
else:
# `args` in this case is always the first parameter, then `input_ids`
output["input_ids"] = output["args"]
del output["args"] Thus it is injecting the @amyeroberts @Rocketknight1 How should I get around this? It must be some misconfiguration in my tests or models. |
@joaocmd Just looking at the error and the CI runs, I think the issue might be a missing |
Thank you @amyeroberts! It seems like that wasn't causing any issue (yet), but thanks to your comment I found out that I had a duplicate |
Hi @amyeroberts and @Rocketknight1, can I get some help with the tests that are still failing? I'm getting
But I don't understand exactly what is being reshaped into the wrong shape. Could I get some insight as to what these tests are doing and why it might be failing? Thanks! |
Hi @joaocmd, there's been some large updates to our TF models regarding how they're built - @Rocketknight1 can give you more details :) Are these errors happening if you rebase on |
Hi @amyeroberts, just rebased the branch. I think it's failing on the same tests but the error on these two tests changed to:
Looking at the stack trace it seems like the image size should have been specified: if hasattr(vision_config, "image_size"):
pixel_values_shape[2] = pixel_values_shape[3] = vision_config.image_size
elif hasattr(vision_config, "input_size"):
pixel_values_shape[2] = pixel_values_shape[3] = vision_config.input_size
else:
raise NotImplementedError( # <------ this error here
"Could not infer input image shape from config, please override input_signature to specify input shapes."
) Shouldn't this also affect the original model? |
@joaocmd Regarding the error, no, it shouldn't affect the original model. You'll notice that the error is being raise in As a side note, did you force push after rebasing? From the PR history, it looks like you might not have. As rebasing is a form of "rewriting history" it's necessary to force push. |
888a7d8
to
2586f4f
Compare
Thanks @amyeroberts, understood. As for the rebase, I had not done one in quite some time and it seems like I did mess it up. I think that is now fixed. Since I started this PR I have had a fundamental question about huggingface's approach to tensorflow models. The default in TensorFlow is NHWC while in PyTorch it is NCHW, how should I approach this difference in my PR? Based on Thank you! |
@joaocmd The pattern we use for the TF vision models is to transpose the NCHW format in the first MainLayer class e.g. here and then transpose back, if pixel values are returned e.g. here. For some of the older models e.g. ViT this pattern may not have been applied, as these were the first models to be added. This pattern means the model is written in the TF compatible NHWC format throughout, but all of our vision models accept and return images in NCHW. |
Thank you @amyeroberts, that makes sense. I've already updated it to match the pattern. I'm still having some trouble with the I've also noticed that there was an overhaul to the serving and dummy_inputs interface (814de8f). But maybe @Rocketknight1 can better explain the consequences of this change to mine (and other) PRs. |
@joaocmd Yes, there was a big refactor of the Looking at the CI run, I don't see |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi @amyeroberts! Sorry for the late response as I've been quite busy... It was failing more tests on my local machine than on the CI run, but after merging the main branch locally they now seem to match.
I can't find the reason for this error. I set up a breakpoint and found that the Edit: I believe the weight belongs to the patch embeddings layer, which I initialized with a self.patch_embedding = tf.keras.Sequential(
[
tf.keras.layers.ZeroPadding2D(padding=(1, 1)),
tf.keras.layers.Conv2D(out_chs // 2, kernel_size=3, strides=2),
tf.keras.layers.BatchNormalization(
epsilon=config.batch_norm_eps, momentum=0.9
), # FIXME: is this the equivalent momentum?
tf.keras.layers.Activation("relu"),
tf.keras.layers.ZeroPadding2D(padding=(1, 1)),
tf.keras.layers.Conv2D(out_chs, kernel_size=3, strides=2),
tf.keras.layers.BatchNormalization(
epsilon=config.batch_norm_eps, momentum=0.9
), # FIXME: is this the equivalent momentum?
tf.keras.layers.Activation("relu"),
],
name="patch_embeddings",
) I think the problem is that both |
@joaocmd I would suggest rewriting the torch version to not use sequential, but only for the purposes of debugging i.e. we wouldn't commit these changes to main. This way you'll be able to go line by line comparing the TF and PT outputs and seeing where any shape differences are coming from. |
Hi @amyeroberts I might be misunderstanding something but I think |
@joaocmd Ah, apologies, I misread your comment. Yes, I believe you're right about the the naming issue. What I suggest is follow the pattern in other ports where |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@joaocmd From next week I'll be off for a few weeks. If you have any implementation questions, please ask @Rocketknight1 :) |
Hi @Rocketknight1, could you give me some pointers on the three remaining tests? I haven't looked closely into However, I am trying to understand what is wrong with There is also this current error that might be due to some misnamed layers, but I am not sure: Thank you! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@joaocmd Are you still working on this? If so, @Rocketknight1 could you help? |
Hi @amyeroberts , I haven't made any changes since my last comment as I was stuck and had some other responsibilities. I would like to finish the issue especially because I believe it's very close to finishing. |
Hi, I'm sorry, I'm not sure how I missed your last comment - this is entirely my fault! Let me investigate the errors you were getting and I'll see if we can get this PR over the line. |
Co-authored-by: Matt <[email protected]>
Replace (self.dim), with self.dim, Co-authored-by: Matt <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
3a2b6fd
to
200dbd0
Compare
Hi @amyeroberts! I think I've addressed all the comments. The CI pipeline is failing on a |
@joaocmd It might be a version mismatch. To ensure you have the same versions of libraries like ruff in your env, you can run |
Thanks @amyeroberts! Is there anything missing on my side? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - thanks for all the work adding this model!
Thank you @amyeroberts and @Rocketknight1 for your patience and support! It was a pleasure |
* Duplicate swiftformer * Convert SwiftFormerPatchEmbedding * Convert SwiftFormerEmbeddings * Convert TFSwiftFormerMlp * Convert TFSwiftFormerConvEncoder * Convert TFSwiftFormerLocalRepresentation * convert TFSwiftFormerEncoderBlock * Convert SwiftFormerStage * Convert SwiftFormerEncoder * Add TFSWiftFormerPreTrainedModel * Convert SwiftFormerForImageClassification * Add kwargs and start drop path * Fix syntax * Change Model class name * Add TFSwiftFormer to __init__ * Duplicate test_modeling_swiftformer * First test conversions * Change require_torch to require_tf * Add exports to swiftformer __init__ * Add TFSwiftFormerModel wrapper * Fix __init__ and run black * Remove docstring from MainLayer, fix padding * Use keras.layers.Activation on keras.Sequential * Fix swiftformer exports * Fix activation layer from config * Remove post_inits * Use tf.keras.layers.ZeroPadding2D * Convert torch normalize * Change tf test input shape * Fix softmax and reduce_sum * Convert expand_dims and repeat * Add missing reshape and tranpose * Simplify TFSwiftFormerEncoderBlock.call * Fix mismatch in patch embeddings * Fix expected output shape to match channels last * Fix swiftformer typo * Disable test_onnx * Fix TFSwiftFormerForImageClassification call * Add unpack inputs * Convert flatten(2).mean(-1) * Change vision dummy inputs (to be reviewed) * Change test_forward_signature to use .call * Fix @unpack_inputs * Set return_tensors="tf" and rename class * Rename wrongly named patch_embeddings layer * Add serving_output and change dummy_input shape * Make dimensions BCHW and transpose inside embedding layer * Change SwiftFormerEncoderBlock * Fix ruff problems * Add image size to swiftformer config * Change tranpose to MainLayer and use -1 for reshape * Remove serving_outputs and dummy_inputs * Remove test_initialization test from tf model * Make Sequential component a separate layer * Fix layers' names * Tranpose encoder outputs * Fix tests and check if hidden states is not None * Fix TFSwiftFormerForImageClassification * Run make fixup * Run make fix-copies * Update modeling_tf_auto * Update docs * Fix modeling auto mapping * Update modelint_tf_swiftformer docs * Fill image_size doc and type * Add reduction=None to loss computation * Update docs * make style * Debug: Delete the tip to see if that changes anything * Re-add tip * Remove add_code_sample_docstrings * Remove unused import * Get the debug to actually tell us the problem it has with the docs * Try a substitution to match the PyTorch file? * Add swiftformer to ignore list * Add build() methods * Update copyright year Co-authored-by: amyeroberts <[email protected]> * Remove FIXME comment * Remove from_pt * Update copyright year Co-authored-by: amyeroberts <[email protected]> * Rename one-letter variables * Remove FIXMEs related to momentum * Remove old TODO comment * Remove outstanding FIXME comments * Get dropout rate from config * Add specific dropout config for MLP * Add convencoder dropout to config * Pass config to SwiftFormerDropPath layer * Fix drop_path variable name and add Adapted from comment * Run ruff * Removed copied from comment * Run fix copies * Change drop_path to identity to match pt * Cleanup build() methods and move to new keras imports * Update docs/source/en/model_doc/swiftformer.md Co-authored-by: Matt <[email protected]> * Raise error if drop_path_rate > 0.0 * Apply suggestions from code review Replace (self.dim), with self.dim, Co-authored-by: Matt <[email protected]> * Remove drop_path function * Add training to TFSwiftFormerEncoder * Set self.built = True last Co-authored-by: amyeroberts <[email protected]> * Should have been added to previous commit Co-authored-by: amyeroberts <[email protected]> * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Change default_feature_extractor to default_image_processor Co-authored-by: amyeroberts <[email protected]> * Import Keras from modeling_tf_utils * Remove relative import * Run ruff --fix * Move import keras to tf_available * Add copied from comment to test_forward_signature * Reduce batch size and num_labels * Extract loss logic to hf_compute_loss * Run ruff format --------- Co-authored-by: Matt <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Matt <[email protected]>
* Duplicate swiftformer * Convert SwiftFormerPatchEmbedding * Convert SwiftFormerEmbeddings * Convert TFSwiftFormerMlp * Convert TFSwiftFormerConvEncoder * Convert TFSwiftFormerLocalRepresentation * convert TFSwiftFormerEncoderBlock * Convert SwiftFormerStage * Convert SwiftFormerEncoder * Add TFSWiftFormerPreTrainedModel * Convert SwiftFormerForImageClassification * Add kwargs and start drop path * Fix syntax * Change Model class name * Add TFSwiftFormer to __init__ * Duplicate test_modeling_swiftformer * First test conversions * Change require_torch to require_tf * Add exports to swiftformer __init__ * Add TFSwiftFormerModel wrapper * Fix __init__ and run black * Remove docstring from MainLayer, fix padding * Use keras.layers.Activation on keras.Sequential * Fix swiftformer exports * Fix activation layer from config * Remove post_inits * Use tf.keras.layers.ZeroPadding2D * Convert torch normalize * Change tf test input shape * Fix softmax and reduce_sum * Convert expand_dims and repeat * Add missing reshape and tranpose * Simplify TFSwiftFormerEncoderBlock.call * Fix mismatch in patch embeddings * Fix expected output shape to match channels last * Fix swiftformer typo * Disable test_onnx * Fix TFSwiftFormerForImageClassification call * Add unpack inputs * Convert flatten(2).mean(-1) * Change vision dummy inputs (to be reviewed) * Change test_forward_signature to use .call * Fix @unpack_inputs * Set return_tensors="tf" and rename class * Rename wrongly named patch_embeddings layer * Add serving_output and change dummy_input shape * Make dimensions BCHW and transpose inside embedding layer * Change SwiftFormerEncoderBlock * Fix ruff problems * Add image size to swiftformer config * Change tranpose to MainLayer and use -1 for reshape * Remove serving_outputs and dummy_inputs * Remove test_initialization test from tf model * Make Sequential component a separate layer * Fix layers' names * Tranpose encoder outputs * Fix tests and check if hidden states is not None * Fix TFSwiftFormerForImageClassification * Run make fixup * Run make fix-copies * Update modeling_tf_auto * Update docs * Fix modeling auto mapping * Update modelint_tf_swiftformer docs * Fill image_size doc and type * Add reduction=None to loss computation * Update docs * make style * Debug: Delete the tip to see if that changes anything * Re-add tip * Remove add_code_sample_docstrings * Remove unused import * Get the debug to actually tell us the problem it has with the docs * Try a substitution to match the PyTorch file? * Add swiftformer to ignore list * Add build() methods * Update copyright year Co-authored-by: amyeroberts <[email protected]> * Remove FIXME comment * Remove from_pt * Update copyright year Co-authored-by: amyeroberts <[email protected]> * Rename one-letter variables * Remove FIXMEs related to momentum * Remove old TODO comment * Remove outstanding FIXME comments * Get dropout rate from config * Add specific dropout config for MLP * Add convencoder dropout to config * Pass config to SwiftFormerDropPath layer * Fix drop_path variable name and add Adapted from comment * Run ruff * Removed copied from comment * Run fix copies * Change drop_path to identity to match pt * Cleanup build() methods and move to new keras imports * Update docs/source/en/model_doc/swiftformer.md Co-authored-by: Matt <[email protected]> * Raise error if drop_path_rate > 0.0 * Apply suggestions from code review Replace (self.dim), with self.dim, Co-authored-by: Matt <[email protected]> * Remove drop_path function * Add training to TFSwiftFormerEncoder * Set self.built = True last Co-authored-by: amyeroberts <[email protected]> * Should have been added to previous commit Co-authored-by: amyeroberts <[email protected]> * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Change default_feature_extractor to default_image_processor Co-authored-by: amyeroberts <[email protected]> * Import Keras from modeling_tf_utils * Remove relative import * Run ruff --fix * Move import keras to tf_available * Add copied from comment to test_forward_signature * Reduce batch size and num_labels * Extract loss logic to hf_compute_loss * Run ruff format --------- Co-authored-by: Matt <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Matt <[email protected]>
* Duplicate swiftformer * Convert SwiftFormerPatchEmbedding * Convert SwiftFormerEmbeddings * Convert TFSwiftFormerMlp * Convert TFSwiftFormerConvEncoder * Convert TFSwiftFormerLocalRepresentation * convert TFSwiftFormerEncoderBlock * Convert SwiftFormerStage * Convert SwiftFormerEncoder * Add TFSWiftFormerPreTrainedModel * Convert SwiftFormerForImageClassification * Add kwargs and start drop path * Fix syntax * Change Model class name * Add TFSwiftFormer to __init__ * Duplicate test_modeling_swiftformer * First test conversions * Change require_torch to require_tf * Add exports to swiftformer __init__ * Add TFSwiftFormerModel wrapper * Fix __init__ and run black * Remove docstring from MainLayer, fix padding * Use keras.layers.Activation on keras.Sequential * Fix swiftformer exports * Fix activation layer from config * Remove post_inits * Use tf.keras.layers.ZeroPadding2D * Convert torch normalize * Change tf test input shape * Fix softmax and reduce_sum * Convert expand_dims and repeat * Add missing reshape and tranpose * Simplify TFSwiftFormerEncoderBlock.call * Fix mismatch in patch embeddings * Fix expected output shape to match channels last * Fix swiftformer typo * Disable test_onnx * Fix TFSwiftFormerForImageClassification call * Add unpack inputs * Convert flatten(2).mean(-1) * Change vision dummy inputs (to be reviewed) * Change test_forward_signature to use .call * Fix @unpack_inputs * Set return_tensors="tf" and rename class * Rename wrongly named patch_embeddings layer * Add serving_output and change dummy_input shape * Make dimensions BCHW and transpose inside embedding layer * Change SwiftFormerEncoderBlock * Fix ruff problems * Add image size to swiftformer config * Change tranpose to MainLayer and use -1 for reshape * Remove serving_outputs and dummy_inputs * Remove test_initialization test from tf model * Make Sequential component a separate layer * Fix layers' names * Tranpose encoder outputs * Fix tests and check if hidden states is not None * Fix TFSwiftFormerForImageClassification * Run make fixup * Run make fix-copies * Update modeling_tf_auto * Update docs * Fix modeling auto mapping * Update modelint_tf_swiftformer docs * Fill image_size doc and type * Add reduction=None to loss computation * Update docs * make style * Debug: Delete the tip to see if that changes anything * Re-add tip * Remove add_code_sample_docstrings * Remove unused import * Get the debug to actually tell us the problem it has with the docs * Try a substitution to match the PyTorch file? * Add swiftformer to ignore list * Add build() methods * Update copyright year Co-authored-by: amyeroberts <[email protected]> * Remove FIXME comment * Remove from_pt * Update copyright year Co-authored-by: amyeroberts <[email protected]> * Rename one-letter variables * Remove FIXMEs related to momentum * Remove old TODO comment * Remove outstanding FIXME comments * Get dropout rate from config * Add specific dropout config for MLP * Add convencoder dropout to config * Pass config to SwiftFormerDropPath layer * Fix drop_path variable name and add Adapted from comment * Run ruff * Removed copied from comment * Run fix copies * Change drop_path to identity to match pt * Cleanup build() methods and move to new keras imports * Update docs/source/en/model_doc/swiftformer.md Co-authored-by: Matt <[email protected]> * Raise error if drop_path_rate > 0.0 * Apply suggestions from code review Replace (self.dim), with self.dim, Co-authored-by: Matt <[email protected]> * Remove drop_path function * Add training to TFSwiftFormerEncoder * Set self.built = True last Co-authored-by: amyeroberts <[email protected]> * Should have been added to previous commit Co-authored-by: amyeroberts <[email protected]> * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Change default_feature_extractor to default_image_processor Co-authored-by: amyeroberts <[email protected]> * Import Keras from modeling_tf_utils * Remove relative import * Run ruff --fix * Move import keras to tf_available * Add copied from comment to test_forward_signature * Reduce batch size and num_labels * Extract loss logic to hf_compute_loss * Run ruff format --------- Co-authored-by: Matt <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Matt <[email protected]>
What does this PR do?
Adds the TensorFlow version of the "SwiftFormer".
Fixes #22771
Before submitting
Pull Request section?
to it if that's the case. TF Swiftformer #22771
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@amyeroberts @D-Roberts