Summary of Related Research on Image-Text Matching
-
[2023 CVPR]
Learning Semantic Relationship among Instances for Image-Text Matching (HREM)
Zheren Fu, Zhendong Mao, Yan Song, Yongdong Zhang
[paper] [code] -
[2023 CVPR]
Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network (CHAN)
Zhengxin Pan, Fangyu Wu, Bailing Zhang
[paper] [code] -
[2023 CVPR]
BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency (BiCro)
Shuo Yang, Zhaopan Xu, Kai Wang, Yang You, Hongxun Yao, Tongliang Liu, Min Xu
[paper] [code] -
[2023 CVPR]
Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Dongwon Kim, Namyup Kim, Suha Kwak
[paper] -
[2023 SIGIR]
Learnable Pillar-based Re-ranking for Image-Text Retrieval
Leigang Qu, Meng Liu, Wenjie Wang, Zhedong Zheng, Liqiang Nie, Tat-Seng Chua
[paper] -
[2023 SIGIR]
Rethinking Benchmarks for Cross-modal Image-text Retrieval
Weijing Chen, Linli Yao, Qin Jin
[paper] -
[2023 WACV]
Dissecting Deep Metric Learning Losses for Image-Text Retrieval
Hong Xuan, Xi (Stephen) Chen
[paper] -
[2023 WACV]
Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval (CMSEI)
Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Joemon M. Jose
[paper] -
[2023 WACV]
More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching
Yuxiao Chen, Jianbo Yuan, Long Zhao, Tianlang Chen, Rui Luo, Larry Davis, Dimitris N. Metaxas
[paper]
-
[2022 ECCV]
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval (CODER)
Haoran Wang, Dongliang He, Wenhao Wu, Boyang Xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang
[paper] -
[2022 CVPR]
Negative-Aware Attention Framework for Image-Text Matching (NAAF)
Kun Zhang, Zhendong Mao, Quan Wang, Yongdong Zhang
[paper] [code] -
[2022 AAAI]
Show Your Faith: Cross-Modal Confidence-Aware Network for Image-Text Matching (CMCAN)
Huatian Zhang, Zhendong Mao, Kun Zhang, Yongdong Zhang
[paper] [code] -
[2022 IJCAI]
Multi-View Visual Semantic Embedding (MV-VSE)
Zheng Li, Caili Guo, Zerun Feng, Jenq-Neng Hwang, Xijun Xue
[paper] -
[2022 IJCAI]
Image-text Retrieval: A Survey on Recent Research and Development
Min Cao, Shiping Li, Juntao Li, Liqiang Nie, Min Zhang
[paper] -
[2022 SIGIR]
Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval
Jun Rao, Fei Wang, Liang Ding, Shuhan Qi, Yibing Zhan, Weifeng Liu, Dacheng Tao
[paper] [code]
-
[2021 ICCV]
Wasserstein Coupled Graph Learning for Cross-Modal Retrieval (WCGL)
Yun Wang, Tong Zhang, Xueya Zhang, Zhen Cui, Yuge Huang, Pengcheng Shen, Shaoxin Li, Jian Yang
[paper] -
[2021 CVPR]
Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching
Shiyang Yan, Li Yu, Yuan Xie
[paper] [code] -
[2021 CVPR]
Learning the Best Pooling Strategy for Visual Semantic Embedding (GPO)
Jiacheng Chen, Hexiang Hu, Hao Wu, Yuning Jiang, Changhu Wang
[paper] [code] -
[2021 AAAI]
Similarity Reasoning and Filtration for Image-Text Matching (SGRAF)
Haiwen Diao, Ying Zhang, Lin Ma, Huchuan Lu
[paper] [code]
-
[2020 CVPR]
Graph Structured Network for Image-Text Matching (GSMN)
Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, Hongtao Xie, Bin Wang, Yongdong Zhang
[paper] [code] -
[2020 CVPR]
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval (IMRAM)
Hui Chen, Guiguang Ding, Xudong Liu, Zijia Lin, Ji Liu, Jungong Han
[paper] [code] -
[2020 CVPR]
Context-Aware Attention Network for Image-Text Retrieval (CAAN)
Qi Zhang, Zhen Lei, Zhaoxiang Zhang, Stan Z. Li
[paper] -
[2020 CVPR]
Multi-Modality Cross Attention Network for Image and Sentence Matching (MMCA)
Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, Feng Wu
[paper] -
[2020 CVPR]
Universal Weighting Metric Learning for Cross-Modal Matching
Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, Heng Tao Shen
[paper] [code] -
[2020 ECCV]
Consensus-Aware Visual-Semantic Embedding for Image-Text Matching (CVSE)
Haoran Wang, Ying Zhang, Zhong Ji, Yanwei Pang, Lin Ma
[paper] [code] -
[2020 ECCV]
Adaptive Offline Quintuplet Loss for Image-Text Matching (AOQ)
Tianlang Chen, Jiajun Deng, Jiebo Luo
[paper] [code]
-
[2019 ICCV]
Visual Semantic Reasoning for Image-Text Matching (VSRN)
Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu
[paper] [code] -
[2019 ICCV]
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval (CAMP)
Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao
[paper] [code] -
[2019 ICCV]
Saliency-Guided Attention Network for Image-Sentence Matching (SAN)
Zhong Ji, Haoran Wang, Jungong Han, Yanwei Pang
[paper] [code] -
[2019 ICCV]
Language-Agnostic Visual-Semantic Embeddings (LIWE)
Jonatas Wehrmann, Maurício Armani Lopes, Douglas Souza, Rodrigo Barros
[paper] [code] -
[2019 CVPR]
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (PVSE)
Yale Song, Mohammad Soleymani
[paper] [code] -
[2019 ACM MM]
Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching (BFAN)
Chunxiao Liu, Zhendong Mao, An-An Liu, Tianzhu Zhang, Bin Wang, Yongdong Zhang
[paper] [code] -
[2019 IJCAI]
Position Focused Attention Network for Image-Text Matching (PFAN)
Yaxiong Wang, Hao Yang, Xueming Qian, Lin Ma, Jing Lu, Biao Li, Xin Fan
[paper] [code]
-
[2018 ECCV]
Stacked Cross Attention for Image-Text Matching (SCAN)
Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, Xiaodong He
[paper] [code] -
[2018 BMVC]
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives (VSE++)
Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, Sanja Fidler
[paper] [code]
-
[2023 TPAMI]
Cross-Modal Retrieval with Partially Mismatched Pairs (RCL)
Peng Hu, Zhenyu Huang, Dezhong Peng, Xu Wang, Xi Peng
[paper] [code] -
[2023 TIP]
Plug-and-Play Regulators for Image-Text Matching (RCAR)
Haiwen Diao, Ying Zhang, Wei Liu, Xiang Ruan, Huchuan Lu
[paper] [code] -
[2023 TMM]
Integrating Language Guidance into Image-Text Matching for Correcting False Negatives (LG)
Zheng Li, Caili Guo, Zerun Feng, Jenq-Neng Hwang, Zhongtian Du
[paper] [code] -
[2023 TMM]
Inter-Intra Modal Representation Augmentation with DCT-Transformer Adversarial Network for Image-Text Matching (DTAN)
Chen Chen, Dan Wang, Bin Song, Hao Tan
[paper]
-
[2022 TIP]
Adaptive Latent Graph Representation Learning for Image-Text Matching
Mengxiao Tian, Xinxiao Wu, Yunde Jia
[paper] -
[2022 TMM]
Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching (UARDA)
Kun Zhang, Zhendong Mao, Anan Liu, Yongdong Zhang
[paper] -
[2022 TCSVT]
Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching (HAT)
Xinfeng Dong, Huaxiang Zhang, Lei Zhu, Liqiang Nie, Li Liu
[paper]
-
[2020 TOMM]
Dual-path Convolutional Image-Text Embeddings with Instance Loss
Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Mingliang Xu, YiDong Shen
[paper] [code] -
[2020 TNNLS]
Cross-Modal Attention With Semantic Consistence for Image–Text Matching (CASC)
Xing Xu, Tan Wang, Yang Yang, Lin Zuo, Fumin Shen, Heng Tao Shen
[paper]
[2014 TACL]
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
Peter Young, Alice Lai, Micah Hodosh, Julia Hockenmaier
[paper]
[2014 ECCV]
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár & C. Lawrence Zitnick
[paper]
Model | Reference | Image Encoder | Text Encoder | Image-to-Text | Text-to-Image | RSUM | ||||
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |||||
VSE++ | 2018 BMVC | ResNet-152 | GRU | 52.9 | 80.5 | 87.2 | 39.6 | 70.1 | 79.5 | 409.8 |
SCAN | 2018 ECCV | BUTD | Bi-GRU | 67.4 | 90.3 | 95.8 | 48.6 | 77.7 | 85.2 | 465.0 |
VSRN | 2019 ICCV | BUTD | GRU | 71.3 | 90.6 | 96.0 | 54.7 | 81.8 | 88.2 | 482.6 |
GSMN | 2020 CVPR | BUTD | Bi-GRU | 76.4 | 94.3 | 97.3 | 57.4 | 82.3 | 89.0 | 496.8 |
SGRAF | 2021 AAAI | BUTD | Bi-GRU | 77.8 | 94.1 | 97.4 | 58.5 | 83.0 | 88.8 | 499.6 |
NAAF | 2022 CVPR | BUTD | Bi-GRU | 81.9 | 96.1 | 98.3 | 61.0 | 85.3 | 90.6 | 513.2 |