Skip to content

GazzolaLab/awesome-embodied-vision

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 

Repository files navigation

Awesome Embodied Vision Awesome

A curated list of embodied vision resources.

Inspired by the awesome list thing and awesome-vln.

By Changan Chen ([email protected]), Department of Computer Science at the University of Texas at Austin, with help from Tushar Nagarajan, Santhosh Kumar Ramakrishnan and Yinfeng Yu. If you see papers missing from the list, please send me an email or a pull request (format see below).

Table of Content

Contributing

When sending PRs, please put the new paper at the correct chronological position as the following format:

* **Paper Title** <br>
*Author(s)* <br>
Conference, Year. [[Paper]](link) [[Code]](link) [[Website]](link)

Papers

PointGoal Navigation

  • Cognitive Mapping and Planning for Visual Navigation
    Saurabh Gupta, Varun Tolani, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik
    CVPR, 2017. [Paper]

  • Habitat: A Platform for Embodied AI Research
    Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra
    ICCV, 2019. [Paper] [Code] [Website]

  • SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
    Daniel Gordon, Abhishek Kadian, Devi Parikh, Judy Hoffman, Dhruv Batra
    ICCV, 2019. [Paper] [Code]

  • A Behavioral Approach to Visual Navigation with Graph Localization Networks
    Kevin Chen, Juan Pablo de Vicente, Gabriel Sepulveda, Fei Xia, Alvaro Soto, Marynel Vazquez, Silvio Savarese
    RSS, 2019. [Paper] [Code] [Website]

  • DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
    Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra
    ICLR, 2020. [Paper] [Code] [Website]

  • Learning to Explore using Active Neural SLAM
    Devendra Singh Chaplot, Dhiraj Gandhi, Saurabh Gupta, Abhinav Gupta, Ruslan Salakhutdinov
    ICLR, 2020. [Paper] [Code] [Website]

  • Auxiliary Tasks Speed Up Learning PointGoal Navigation
    Joel Ye, Dhruv Batra, Erik Wijmans, Abhishek Das
    CoRL, 2020. [Paper] [Code]

  • Occupancy Anticipation for Efficient Exploration and Navigation
    Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman
    ECCV, 2020. [Paper] [Code] [Website]

  • Embodied Visual Navigation with Automatic Curriculum Learning in Real Environments
    Steven D. Morad, Roberto Mecca, Rudra P.K. Poudel, Stephan Liwicki, Roberto Cipolla
    ICRA, 2021. [Paper]

  • Differentiable SLAM-Net: Learning Particle SLAM for Visual Navigation
    Peter Karkus, Shaojun Cai, David Hsu
    CVPR, 2021. [Paper]

  • The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
    Peter Karkus, Shaojun Cai, David Hsu
    ICCV, 2021. [Paper] [Code] [Website]

  • RobustNav: Towards Benchmarking Robustness in Embodied Navigation
    Prithvijit Chattopadhyay, Judy Hoffman, Roozbeh Mottaghi, Aniruddha Kembhavi
    ICCV, 2021. [Paper] [Code] [Website]

Audio-Visual Navigation

  • Audio-Visual Embodied Navigation
    Changan Chen*, Unnat Jain*, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman
    ECCV, 2020. [Paper] [Website]

  • Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
    Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum
    ICRA, 2020. [Paper] [Website]

  • Learning to Set Waypoints for Audio-Visual Navigation
    Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh K. Ramakrishnan, Kristen Grauman
    ICLR, 2021. [Paper] [Website]

  • Semantic Audio-Visual Navigation
    Changan Chen, Ziad Al-Halah, Kristen Grauman
    CVPR, 2021. [Paper] [Code] [Website]

  • Move2Hear: Active Audio-Visual Source Separation
    Sagnik Majumder, Ziad Al-Halah, and Kristen Grauman
    ICCV, 2021. [Paper] [Website]

  • Active Audio-Visual Separation of Dynamic Sound Sources
    Sagnik Majumder, Ziad Al-Halah, and Kristen Grauman
    ECCV, 2022. [Paper] [Website]

  • Sound Adversarial Audio-Visual Navigation
    Yinfeng Yu, Wenbing Huang, Fuchun Sun, Changan Chen, Yikai Wang, Xiaohong Liu
    ICLR, 2022. [Paper] [Code] [Website]

  • SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
    Changan Chen*, Carl Schissler*, Sanchit Garg*, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman
    arXiv, 2022. [Paper] [Code] [Website]

  • Pay Self-Attention to Audio-Visual Navigation
    Yinfeng Yu, Lele Cao, Fuchun Sun, Xiaohong Liu, Liejun Wang
    BMVC 2022. [Paper] [Website]

ObjectGoal Navigation

  • Cognitive Mapping and Planning for Visual Navigation
    Saurabh Gupta, Varun Tolani, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik
    CVPR, 2017. [Paper]

  • Visual Semantic Navigation using Scene Priors
    Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi
    ICLR, 2019. [Paper]

  • Visual Representations for Semantic Target Driven Navigation
    Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, Ayzaan Wahid, James Davidson
    ICRA, 2019. [Paper] [Code]

  • Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning
    Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
    CVPR, 2019. [Paper] [Code] [Website]

  • Bayesian Relational Memory for Semantic Visual Navigation
    Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian
    ICCV, 2019. [Paper] [Code]

  • Situational Fusion of Visual Representation for Visual Navigation
    William B. Shen, Danfei Xu, Yuke Zhu, Leonidas J. Guibas, Li Fei-Fei, Silvio Savarese
    ICCV, 2019. [Paper]

  • Object Goal Navigation using Goal-Oriented Semantic Exploration
    Devendra Singh Chaplot, Dhiraj Gandhi, Abhinav Gupta*, Ruslan Salakhutdinov*
    NeurIPS, 2020. [Paper] [Website]

  • Learning Object Relation Graph and Tentative Policy for Visual Navigation
    Heming Du, Xin Yu, Liang Zheng
    ECCV, 2020. [Paper]

  • Semantic Visual Navigation by Watching YouTube Videos
    Matthew Chang, Arjun Gupta, Saurabh Gupta
    arXiv, 2020. [Paper] [Website]

  • ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects
    Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans
    arXiv, 2020. [Paper]

  • MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
    Saim Wani*, Shivansh Patel*, Unnat Jain*, Angel X. Chang, Manolis Savva
    NeurIPS, 2020. [Paper] [Code] [Website]

  • Learning hierarchical relationships for object-goal navigation
    Yiding Qiu, Anwesan Pal, Henrik I. Christensen
    CoRL, 2020. [Paper]

  • VTNet: Visual Transformer Network for Object Goal Navigation
    Heming Du, Xin Yu, Liang Zheng
    ICLR, 2021. [Paper]

  • Visual Navigation With Spatial Attention
    Yiding Qiu, Anwesan Pal, Henrik I. Christensen
    CVPR, 2021. [Paper]

  • Auxiliary Tasks and Exploration Enable ObjectGoal Navigation
    Peter Karkus, Shaojun Cai, David Hsu
    ICCV, 2021. [Paper] [Code] [Website]

  • Hierarchical Object-to-Zone Graph for Object Navigation
    Sixian Zhang, Xinhang Song, Yubing Bai, Weijie Li, Yakui Chu, Shuqiang Jiang
    ICCV, 2021. [Paper] [Code] [Video]

  • THDA: Treasure Hunt Data Augmentation for Semantic Navigation
    Oleksandr Maksymets, Vincent Cartillier, Aaron Gokaslan, Erik Wijmans, Wojciech Galuba, Stefan Lee, Dhruv Batra
    ICCV, 2021. [Paper]

  • 🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
    Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi
    arXiv, 2022. [Paper] [Website]

ImageGoal Navigation

  • Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning
    Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi
    ICRA, 2017. [Paper] [Website]

  • Semi-Parametric Topological Memory for Navigation
    Nikolay Savinov*, Alexey Dosovitskiy*, Vladlen Koltun
    ICLR, 2018. [Paper] [Code] [Website]

  • Neural Topological SLAM for Visual Navigation
    Devendra Singh Chaplot, Ruslan Salakhutdinov, Abhinav Gupta, Saurabh Gupta
    CVPR, 2020. [Paper] [Website]

  • Visual Graph Memory with Unsupervised Representation for Visual Navigation
    Obin Kwon, Nuri Kim, Yunho Choi, Hwiyeon Yoo, Jeongho Park, Songhwai Oh
    ICCV, 2021. [Paper] [Code] [Website]

  • No RL, No Simulation: Learning to Navigate without Navigating
    Meera Hahn, Devendra Chaplot, Shubham Tulsiani, Mustafa Mukadam, James M. Rehg, Abhinav Gupta
    NeurIPS, 2021. [Paper]

  • Topological Semantic Graph Memory for Image-Goal Navigation
    Nuri Kim, Obin Kwon, Hwiyeon Yoo, Yunho Choi, Jeongho Park, Songhwai Oh
    CoRL, 2022. [Paper] [Code] [Website]

Vision-Language Navigation

  • Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
    Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel
    CVPR, 2018. [Paper] [Code] [Website]

  • Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation
    Xin Wang, Wenhan Xiong, Hongmin Wang, William Yang Wang
    ECCV, 2018. [Paper]

  • Mapping Instructions to Actions in 3D Environmentswith Visual Goal Prediction
    Dipendra Misra, Andrew Bennett, Valts Blukis, Eyvind Niklasson, Max Shatkhin, Yoav Artzi
    EMNLP, 2018. [Paper]

  • Speaker-Follower Models for Vision-and-Language Navigation
    Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell
    NeurIPS, 2018. [Paper] [Code] [Website]

  • Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
    Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang
    CVPR, 2019. [Paper]

  • Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
    Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong
    ICLR, 2019. [Paper] [Code] [Website]

  • The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation
    Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira
    CVPR, 2019. [Paper] [Code] [Website]

  • TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
    Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi
    CVPR, 2019. [Paper] [Code]

  • Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation
    Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa
    CVPR, 2019. [Paper] [Code] [Video]

  • Vision-based Navigation with Language-based Assistance via Imitation Learning with Indirect Intervention
    Khanh Nguyen, Debadeepta Dey, Chris Brockett, Bill Dolan
    CVPR, 2019. [Paper] [Code] [Video]

  • Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning
    Khanh Nguyen, Hal Daumé III
    EMNLP, 2019. [Paper] [Code] [Video]

  • Chasing Ghosts: Instruction Following as Bayesian State Tracking
    Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee
    NeurIPS, 2019. [Paper] [Code] [Video]

  • Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
    Federico Landi, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
    BMVC, 2019. [Paper] [Code]

  • Transferable Representation Learning in Vision-and-Language Navigation
    Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie
    ICCV, 2019. [Paper]

  • Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation
    Juncheng Li, Xin Wang, Siliang Tang, Haizhou Shi, Fei Wu, Yueting Zhuang, William Yang Wang
    CVPR, 2020. [Paper]

  • Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks
    Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang
    CVPR, 2020. [Paper]

  • Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation
    Federico Landi, Lorenzo Baraldi, Marcella Cornia, Massimiliano Corsini, Rita Cucchiara
    arXiv, 2019. [Paper] [Code]

  • Just Ask: An Interactive Learning Framework for Vision and Language Navigation
    Ta-Chung Chi, Mihail Eric, Seokhwan Kim, Minmin Shen, Dilek Hakkani-tur
    AAAI, 2020. [Paper]

  • Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
    Weituo Hao, Chunyuan Li, Xiujun Li, Lawrence Carin, Jianfeng Gao
    CVPR, 2020. [Paper] [Code]

  • Environment-agnostic Multitask Learning for Natural Language Grounded Navigation
    Xin Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi
    ECCV, 2020. [Paper]

  • Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling
    Tsu-Jui Fu, Xin Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang
    ECCV, 2020. [Paper]

  • Multi-View Learning for Vision-and-Language Navigation
    Qiaolin Xia, Xiujun Li, Chunyuan Li, Yonatan Bisk, Zhifang Sui, Jianfeng Gao, Yejin Choi, Noah A. Smith
    arXiv, 2020. [Paper]

  • Vision-Dialog Navigation by Exploring Cross-modal Memory
    Yi Zhu, Fengda Zhu, Zhaohuan Zhan, Bingqian Lin, Jianbin Jiao, Xiaojun Chang, Xiaodan Liang
    CVPR, 2020. [Paper] [Code]

  • Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation
    Felix Yu, Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky
    arXiv, 2020. [Paper]

  • Sub-Instruction Aware Vision-and-Language Navigation
    Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould
    arXiv, 2020. [Paper]

  • Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments
    Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee
    ECCV, 2020. [Paper] [Code] [Website]

  • Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling
    Tsu-Jui Fu, Xin Eric Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang
    ECCV, 2020. [Paper] [Code] [Website]

  • Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
    Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra
    ECCV, 2020. [Paper]

  • Soft Expert Reward Learning for Vision-and-Language Navigation
    Hu Wang, Qi Wu, Chunhua Shen
    ECCV, 2020. [Paper]

  • Active Visual Information Gathering for Vision-Language Navigation
    Hanqing Wang, Wenguan Wang, Tianmin Shu, Wei Liang, Jianbing Shen
    ECCV, 2020. [Paper] [Code]

  • Environment-agnostic Multitask Learning for Natural Language Grounded Navigation
    Xin Eric Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi
    ECCV, 2020. [Paper]

  • Language and Visual Entity Relationship Graph for Agent Navigation
    Yicong Hong, Cristian Rodriguez, Yuankai Qi, Qi Wu, Stephen Gould
    NeurIPS, 2020. [Paper] [Code]

  • Counterfactual Vision-and-Language Navigation: Unravelling the Unseen
    Amin Parvaneh, Ehsan Abbasnejad, Damien Teney, Javen Qinfeng Shi, Anton van den Hengel
    NeurIPS, 2020. [Paper]

  • Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
    Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky
    NeurIPS, 2020. [Paper]

  • Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning
    Weixia Zhang, Chao Ma, Qi Wu, Xiaokang Yang
    TCSVT, 2020. [Paper]

  • Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule
    Shuhei Kurita, Kyunghyun Cho
    ICLR, 2021. [Paper]

  • Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation
    Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira
    ICRA, 2021. [Paper] [Code] [Website] [Video]

  • VLN BERT: A Recurrent Vision-and-Language BERT for Navigation
    Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
    CVPR, 2021. [Paper] [Code]

  • Structured Scene Memory for Vision-Language Navigation
    Hanqing Wang, Wenguan Wang, Wei Liang, Caiming Xiong, Jianbing Shen
    CVPR, 2021. [Paper] [Code]

  • Topological Planning With Transformers for Vision-and-Language Navigation
    Kevin Chen, Junshen K. Chen, Jo Chuang, Marynel Vázquez, Silvio Savarese
    CVPR, 2021. [Paper]

  • SOON: Scenario Oriented Object Navigation With Graph-Based Exploration
    Fengda Zhu, Xiwen Liang, Yi Zhu, Qizhi Yu, Xiaojun Chang, Xiaodan Liang
    CVPR, 2021. [Paper]

  • Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression
    Chen Gao, Jinyu Chen, Si Liu, Luting Wang, Qiong Zhang, Qi Wu
    CVPR, 2021. [Paper]

  • Scene-Intuitive Agent for Remote Embodied Visual Grounding
    Xiangru Lin, Guanbin Li, Yizhou Yu
    CVPR, 2021. [Paper]

  • Neighbor-view Enhanced Model for Vision and Language Navigation
    Dong An, Yuankai Qi, Yan Huang, Qi Wu, Liang Wang, Tieniu Tan
    ACM MM, 2021. [Paper] [Code]

  • The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
    Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton van den Hengel, Qi Wu
    ICCV, 2021. [Paper] [Code]

  • Pathdreamer: A World Model for Indoor Navigation
    Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
    ICCV, 2021. [Paper] [Code] [Website]

  • Episodic Transformer for Vision-and-Language Navigation
    Alexander Pashevich, Cordelia Schmid, Chen Sun
    ICCV, 2021. [Paper] [Code] [Website]

  • Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation
    Yi Zhu*, Yue Weng*, Fengda Zhu, Xiaodan Liang, , Qixiang Ye, Yutong Lu, Jianbin Jiao
    ICCV, 2021. [Paper]

  • Vision-Language Navigation with Random Environmental Mixup
    Chong Liu*, Fengda Zhu*, Xiaojun Chang, Xiaodan Liang, Zongyuan Ge, Yi-Dong Shen
    ICCV, 2021. [Paper] [Code]

  • Waypoint Models for Instruction-guided Navigation in Continuous Environments
    Jacob Krantz, Aaron Gokaslan, Dhruv Batra, Stefan Lee, Oleksandr Maksymets
    ICCV, 2021. [Paper] [Code] [Website]

  • Airbert: In-domain Pretraining for Vision-and-Language Navigation
    Pierre-Louis Guhur, Makarand Tapaswi, Shizhe Chen, Ivan Laptev, Cordelia Schmid
    ICCV, 2021. [Paper] [Code] [Website]

  • Curriculum Learning for Vision-and-Language Navigation
    Jiwen Zhang, Zhongyu Wei, Jianqing Fan, Jiajie Peng
    NeurIPS, 2021. [Paper] [Code]

  • SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
    Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra
    NeurIPS, 2021. [Paper]

  • History Aware Multimodal Transformer for Vision-and-Language Navigation
    Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev
    NeurIPS, 2021. [Paper] [Website] [Code]

  • SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments
    Muhammad Zubair Irshad, Niluthpol Chowdhury Mithun, Zachary Seymour, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar
    ICPR, 2022. [Paper] [Website] [Video]

Multiagent Navigation

  • Two Body Problem: Collaborative Visual Task Completion
    Unnat Jain*, Luca Weihs*, Eric Kolve, Mohammad Rastegari, Svetlana Lazebnik, Ali Farhadi, Alexander Schwing, Aniruddha Kembhavi
    CVPR, 2019. [Paper] [Website]

  • A Cordial Sync: Going Beyond Marginal Policies For Multi-Agent Embodied Tasks
    Unnat Jain*, Luca Weihs*, Eric Kolve, Ali Farhadi, Svetlana Lazebnik, Aniruddha Kembhavi, Alexander Schwing
    ECCV, 2020. [Paper] [Code] [Website]

  • Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents
    Shivansh Patel*, Saim Wani*, Unnat Jain*, Alexander G. Schwing, Svetlana Lazebnik, Manolis Savva, Angel X. Chang
    ICCV, 2021. [Paper]

  • GRIDTOPIX: Training Embodied Agents with Minimal Supervision
    Shivansh Patel*, Saim Wani*, Unnat Jain*, Alexander G. Schwing, Svetlana Lazebnik, Manolis Savva, Angel X. Chang
    ICCV, 2021. [Paper] [Website]

  • Sound Adversarial Audio-Visual Navigation
    Yinfeng Yu, Wenbing Huang, Fuchun Sun, Changan Chen, Yikai Wang, Xiaohong Liu
    ICLR, 2022. [Paper] [Code] [Website]

Active Visual Tracking

  • End-to-end Active Object Tracking via Reinforcement Learning
    Wenhan Luo*, Peng Sun*, Fangwei Zhong, Wei Liu, Tong Zhang, Yizhou Wang
    ICML, 2018. [Paper] [Website]

  • End-to-end Active Object Tracking and Its Real-world Deployment via Reinforcement Learning
    Wenhan Luo*, Peng Sun*, Fangwei Zhong*, Wei Liu, Tong Zhang, Yizhou Wang
    IEEE TPAMI, 2019. [Paper] [Website]

  • AD-VAT: An Asymmetric Dueling mechanism for learning Visual Active Tracking
    Fangwei Zhong, Wenhan Luo, Peng Sun, Tingyun Yan, Yizhou Wang
    ICLR, 2019. [Paper] [Code]

  • AD-VAT+: An Asymmetric Dueling Mechanism for Learning and Understanding Visual Active Tracking
    Fangwei Zhong, Wenhan Luo, Peng Sun, Tingyun Yan, Yizhou Wang
    IEEE TPAMI, 2019. [Paper] [Code]

  • Pose-Assisted Multi-Camera Collaboration for Active Object Tracking
    Jing Li*, Jing Xu*, Fangwei Zhong*, Xiangyu Kong, Yu Qiao, Yizhou Wang
    AAAI, 2020. [Paper] [Code]

  • Towards Distraction-Robust Active Visual Tracking
    Fangwei Zhong, Wenhan Luo, Peng Sun, Tingyun Yan, Yizhou Wang
    ICML, 2021. [Paper] [Code] [Website]

Visual Exploration

  • Curiosity-driven Exploration by Self-supervised Prediction
    Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell
    ICML, 2017. [Paper] [Code] [Website]

  • Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks
    Dinesh Jayaraman, Kristen Grauman
    CVPR, 2018. [Paper]

  • Sidekick Policy Learning for Active Visual Exploration
    Santhosh K. Ramakrishnan, Kristen Grauman
    ECCV, 2018. [Paper] [Code] [Website]

  • Learning Exploration Policies for Navigation
    Tao Chen, Saurabh Gupta, Abhinav Gupta
    ICLR, 2019. [Paper] [Code] [Website]

  • Episodic Curiosity through Reachability
    Nikolay Savinov, Anton Raichuk, Damien Vincent, Raphael Marinier, Marc Pollefeys, Timothy Lillicrap, Sylvain Gelly
    ICLR, 2019. [Paper] [Code] [Website]

  • Emergence of Exploratory Look-Around Behaviors through Active Observation Completion
    Santhosh K. Ramakrishnan*, Dinesh Jayaraman*, Kristen Grauman
    Science Robotics, 2019. [Paper] [Code] [Website]

  • Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks
    Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese
    CVPR, 2019. [Paper] [Website]

  • Explore and Explain: Self-supervised Navigation and Recounting
    Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara
    ICPR, 2020. [Paper]

  • Learning to Explore using Active Neural SLAM
    Devendra Singh Chaplot, Dhiraj Gandhi, Saurabh Gupta, Abhinav Gupta, Ruslan Salakhutdinov
    ICLR, 2020. [Paper] [Code] [Website]

  • Semantic Curiosity for Active Visual Learning
    Devendra Singh Chaplot, Helen Jiang, Saurabh Gupta, Abhinav Gupta
    ECCV, 2020. [Paper] [Website]

  • See, Hear, Explore: Curiosity via Audio-Visual Association
    Victoria Dean, Shubham Tulsiani, Abhinav Gupta
    NeurIPS, 2020. [Paper] [Website]

  • Occupancy Anticipation for Efficient Exploration and Navigation
    Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman
    ECCV, 2020. [Paper] [Code] [Website]

  • Focus on Impact: Indoor Exploration With Intrinsic Motivation
    Roberto Bigazzi, Federico Landi, Silvia Cascianelli, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
    IEEE RA-L + ICRA, 2022. [Paper] [Code]

  • Symmetry-aware Neural Architecture for. Embodied Visual Exploration
    Shuang Liu, Takayuki Okatani
    CVPR, 2022. [Paper] [Code] [Website]

Embodied Question Answering

  • Embodied Question Answering
    Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra
    CVPR, 2018. [Paper] [Code] [Website]

  • Multi-Target Embodied Question Answering
    Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra
    CVPR, 2019. [Paper]

  • Embodied Question Answering in Photorealistic Environments with Point Cloud Perception
    Erik Wijmans*, Samyak Datta*, Oleksandr Maksymets*, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra
    CVPR, 2019. [Paper]

  • Episodic Memory Question Answering
    Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, and Devi Parikh
    CVPR, 2022. [Paper] [Website]

Visual Interactions

  • Visual Semantic Planning using Deep Successor Representations
    Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi
    ICCV, 2017. [Paper]

  • IQA: Visual Question Answering in Interactive Environments
    Daniel Gordon, Aniruddha Kembhavi, Mohammad Rastegari, Joseph Redmon, Dieter Fox, and Ali Farhadi
    CVPR, 2018. [Paper] [Code] [Website]

  • ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
    Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox
    CVPR, 2020. [Paper] [Code] [Website]

  • Learning About Objects by Learning to Interact with Them
    Martin Lohmann, Jordi Salvador, Aniruddha Kembhavi, Roozbeh Mottaghi
    NeurIPS, 2020. [Paper]

  • Learning Affordance Landscapes for Interaction Exploration in 3D Environments
    Tushar Nagarajan, Kristen Grauman
    NeurIPS, 2020. [Paper]

  • ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
    Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, Matthew Hausknecht
    ICLR, 2021. [Paper] [Code] [Website]

  • Learning Generalizable Visual Representations via Interactive Gameplay
    Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi
    ICLR, 2021. [Paper]

  • Pushing It Out of the Way: Interactive Visual Navigation
    Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi
    CVPR, 2021. [Paper] [Code] [Website]

  • ManipulaTHOR: A Framework for Visual Object Manipulation
    Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi
    CVPR, 2021. [Paper] [Code] [Website]

  • 🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
    Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi
    arXiv, 2022. [Paper] [Website]

Rearrangement

  • Rearrangement: A Challenge for Embodied AI
    Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su
    arXiv, 2020. [Paper]

  • Visual Room Rearrangement
    Luca Weihs, Matt Deitke, Aniruddha Kembhavi, and Roozbeh Mottaghi
    CVPR, 2021. [Paper] [Code] [Website]

  • Habitat 2.0: Training Home Assistants to Rearrange their Habitat
    Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra
    NeurIPS 2021. [Paper] [Code]

  • 🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
    Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi
    arXiv, 2022. [Paper] [Website]

Sim-to-real Transfer

  • Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
    Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, Pieter Abbeel
    IROS, 2017. [Paper]

  • Sim-to-Real Transfer for Vision-and-Language Navigation
    Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee
    CoRL, 2020. [Paper]

  • RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real
    Kanishka Rao, Chris Harris, Alex Irpan, Sergey Levine, Julian Ibarz, Mohi Khansari
    CVPR, 2020. [Paper]

  • Bi-directional Domain Adaptation for Sim2Real Transfer of Embodied Navigation Agents
    Joanne Truong, Sonia Chernova, Dhruv Batra
    RA-L, 2021. [Paper]

Datasets

  • A Dataset for Developing and Benchmarking Active Vision
    Phil Ammirato, Patrick Poirson, Eunbyung Park, Jana Kosecka, Alexander C. Berg
    ICRA, 2017. [Paper] [Code] [Website]

  • AI2-THOR: An Interactive 3D Environment for Visual AI
    Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Daniel Gordon, Yuke Zhu, Abhinav Gupta, Ali Farhadi
    arXiv, 2017. [Paper] [Code] [Website]

  • Matterport3D: Learning from RGB-D Data in Indoor Environments
    Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang
    3DV, 2017. [Paper] [Code] [Website]

  • Gibson Env: Real-World Perception for Embodied Agents
    Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese
    CVPR, 2018. [Paper] [Code] [Website]

  • The Replica Dataset: A Digital Replica of Indoor Spaces
    Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. Strasdat, Renzo De Nardi, Michael Goesele, Steven Lovegrove, Richard Newcombe
    arXiV, 2019. [Paper] [Code]

  • Actionet: An Interactive End-to-End Platform for Task-Based Data Collection and Augmentation in 3D Environments
    Jiafei Duan, Samson Yu, Hui Li Tan, Cheston Tan
    ICIP, 2020. [Paper] [Code]

  • Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI
    Santhosh K. Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alex Clegg, John Turner, Eric Undersander, Wojciech Galuba, Andrew Westbury, Angel X. Chang, Manolis Savva, Yili Zhao, Dhruv Batra
    NeurIPS, 2021. [Paper] [Website]

  • 🏘️ ProcTHOR-10K: 10K Interactive Household Environments for Embodied AI
    Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi
    arXiv, 2022. [Paper] [Website]

Simulators

  • AI2-THOR: An Interactive 3D Environment for Visual AI
    Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Daniel Gordon, Yuke Zhu, Abhinav Gupta, Ali Farhadi
    arXiv, 2017. [Paper] [Code] [Website]

  • UnrealCV: Virtual Worlds for Computer Vision
    Weichao Qiu, Fangwei Zhong, Yi Zhang, Siyuan Qiao, Zihao Xiao, Tae Soo Kim, Yizhou Wang, Alan Yuille
    ACM MM Open Source Software Competition, 2017. [Paper] [Code] [Website]

  • Building Generalizable Agents with a Realistic and Rich 3D Environment (House3D)
    Yi Wu, Yuxin Wu, Georgia Gkioxari, Yuandong Tian
    arXiv, 2018. [Paper] [Code]

  • CHALET: Cornell House Agent Learning Environment
    Claudia Yan, Dipendra Misra, Andrew Bennett, Aaron Walsman, Yonatan Bisk and Yoav Artzi
    arXiv, 2018. [Paper] [Code]

  • RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
    Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi
    CVPR, 2020. [Paper] [Website]

  • Gibson Env: Real-World Perception for Embodied Agents
    Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese
    CVPR, 2018. [Paper] [Code] [Website]

  • Habitat: A Platform for Embodied AI Research
    Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra
    ICCV, 2019. [Paper] [Code] [Website]

  • VirtualHome: Simulating Household Activities via Programs
    Xavier Puig*, Kevin Ra*, Marko Boben*, Jiaman Li, Tingwu Wang, Sanja Fidler, Antonio Torralba
    CVPR, 2018. [Paper] [Code] [Website]

  • ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation
    Chuang Gan, Jeremy Schwartz, Seth Alter, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Damian Mrowca, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, James J. DiCarlo, Josh McDermott, Joshua B. Tenenbaum, Daniel L.K. Yamins
    arXiv, 2020. [Paper] [Code] [Website]

  • ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
    Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, Matthew Hausknecht
    ICLR, 2021. [Paper] [Code] [Website]

  • ManipulaTHOR: A Framework for Visual Object Manipulation
    Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi
    CVPR, 2021. [Paper] [Code] [Website]

  • 🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
    Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi
    arXiv, 2022. [Paper] [Website]

MISC

  • Visual Learning and Embodied Agents in Simulation Environments Workshop
    ECCV, 2018. [website]

  • Embodied-AI Workshop
    CVPR, 2020/2021. [website]

  • Gibson Sim2Real Challenge
    CVPR, 2020. [website]

  • Embodied Vision, Actions & Language Workshop
    ECCV, 2020. [website]

  • Closing the Reality Gap in Sim2Real Transfer for Robotics
    RSS, 2020. [website]

  • On Evaluation of Embodied Navigation Agents
    Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir
    arXiv, 2018. [Paper]

  • PyRobot: An Open-source Robotics Framework for Research and Benchmarking
    Adithya Murali*, Tao Chen*, Kalyan Vasudev Alwala*, Dhiraj Gandhi*, Lerrel Pinto, Saurabh Gupta, Abhinav Gupta
    arXiv, 2019. [Paper] [Code] [Website]

  • A Survey of Embodied AI: From Simulators to Research Tasks
    Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, Cheston Tan
    arXiv, 2021. [Paper]

  • AllenAct: A Framework for Embodied AI Research
    Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, Aniruddha Kembhavi
    arXiv, 2020. [Paper] [Website]

  • CSAIL Embodied Intelligence Seminar
    [website]

About

Reading list for research topics in embodied vision

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published