-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ViG models [NeurIPS 2022] #1578
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
@iamhankai FYI, you can use |
Hmm, seems the tracing issue harder to solve, just preventing trace won't bypass the bool issue without some restructure. I'd also need to tweak some other interface issues wrt to other models. Trying the model out, the 'base' as example seems roughly on par with a Swin (v1) base for accuracy and param/flops, but it runs at < 1/2 the speed. Any way to improve the runtime performance? Have there been any weights or attempts to scale the training to larger datasets? Interesting performance differents there vs other vit or vit related hybrid arch? |
We have pretrained ViG on ImageNet-22K. It performs slightly better than Swin Transformer:
As for the runtime, accelerating GNN is an open problem. |
@rwightman Hi, we released the weights to scale the training to larger ImageNet22K dataset: https://github.com/huawei-noah/Efficient-AI-Backbones/releases/download/pyramid-vig/pvig_m_im21k_90e.pth It performs slightly better than IM22K pretrained Swin Transformer:
|
tests/test_models.py
Outdated
@@ -295,12 +295,6 @@ def test_model_features_pretrained(model_name, batch_size): | |||
"""Create that pretrained weights load when features_only==True.""" | |||
create_model(model_name, pretrained=True, features_only=True) | |||
|
|||
EXCLUDE_JIT_FILTERS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems these lines cannot be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to fix the other jit exceptions so would rather not add more, I feel it's likely it can be supported with appropriate type decl, etc
Add ViG models from paper: Vision GNN: An Image is Worth Graph of Nodes (NeurIPS 2022), https://arxiv.org/abs/2206.00272
Network architecture plays a key role in the deep learning-based computer vision system. The widely-used convolutional neural network and transformer treat the image as a grid or sequence structure, which is not flexible to capture irregular and complex objects. In this paper, we propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks. We first split the image to a number of patches which are viewed as nodes, and construct a graph by connecting the nearest neighbors. Based on the graph representation of images, we build our ViG model to transform and exchange information among all the nodes. ViG consists of two basic modules: Grapher module with graph convolution for aggregating and updating graph information, and FFN module with two linear layers for node feature transformation. Both isotropic and pyramid architectures of ViG are built with different model sizes. Extensive experiments on image recognition and object detection tasks demonstrate the superiority of our ViG architecture. We hope this pioneering study of GNN on general visual tasks will provide useful inspiration and experience for future research.