diff --git a/README.md b/README.md index 1ab57ae..40bedc6 100644 --- a/README.md +++ b/README.md @@ -1 +1,2238 @@ -# Awesome-Reasoning-Foundation-Models \ No newline at end of file +# Awesome-Reasoning-Foundation-Models [![Awesome](https://awesome.re/badge.svg)](https://awesome.re) + +![overview](assets/0_reasoning.jpg) + +A curated list of awesome large AI models, or foundation models, for reasoning. +We organize the current foundation models into three categories: *language foundation models*, *vision foundation models*, and *multimodal foundation models*. +Further, we elaborate the foundation models in reasoning tasks, including *commonsense*, *mathematical*, *logical*, *causal*, *visual*, *audio*, *multimodal*, *embodied reasoning*, etc. +Reasoning techniques are also summarized. + +We welcome contributions to this repository to add more resources. Please submit a pull request if you want to contribute! + + + + +## Table of Contents + +- [0 Survey](#0-survey) +- [1 Relevant Surveys](#1-relevant-surveys-and-links) +- [2 Foundation Models](#2-foundation-models) + - [2.1 Language Foundation Models](#21-language-foundation-models) + - [2.2 Vision Foundation Models](#22-vision-foundation-models) + - [2.3 Multimodal Foundation Models](#23-multimodal-foundation-models) +- [3 Reasoning Tasks](#3-reasoning-tasks) + - [3.1 Commonsense Reasoning](#31-commonsense-reasoning) + - [3.2 Mathematical Reasoning](#32-mathematical-reasoning) + - [3.3 Logical Reasoning](#33-logical-reasoning) + - [3.4 Causal Reasoning](#34-causal-reasoning) + - [3.5 Visual Reasoning](#35-visual-reasoning) + - [3.6 Audio Reasoning](#36-audio-reasoning) + - [3.7 Multimodal Reasoning](#37-multimodal-reasoning) + - [3.8 Embodied Reasoning](#38-embodied-reasoning) + - [3.9 Other Tasks and Applications](#39-other-tasks-and-applications) +- [4 Reasoning Techniques](#4-reasoning-techniques) + - [4.1 Pre-Training](#41-pre-training) + - [4.2 Fine-Tuning](#42-fine-tuning) + - [4.3 Alignment Training](#43-alignment-training) + - [4.4 Mixture of Experts (MoE)](#44-mixture-of-experts-moe) + - [4.5 In-Context Learning](#45-in-context-learning) + - [4.6 Autonomous Agent](#46-autonomous-agent) + + +## 0 Survey + +![overview](assets/1_overview.jpg) + +This repository is primarily based on the following paper: + +>[**Reasoning with Foundation Models: Concepts, Methodologies, and Outlook**]()
+> +> [Jiankai Sun](https://web.stanford.edu/~jksun/), +[Chuanyang Zheng](https://chuanyang-zheng.github.io/), +[Enze Xie](https://xieenze.github.io/), +Zhengying Liu, +Ruihang Chu, +Jianing Qiu, +Jiaqi Xu, +Mingyu Ding, +Hongyang Li, +Mengzhe Geng, +Yue Wu, +Wenhai Wang, +Junsong Chen, +Xiaozhe Ren, +Jie Fu, +Junxian He, +Wu Yuan, +Qi Liu, +Xihui Liu, +Yu Li, +Hao Dong, +Yu Cheng, +Ming Zhang, +Pheng Ann Heng, +Jifeng Dai, +Ping Luo, +Jingdong Wang, +Jirong Wen, +Xipeng Qiu, +Yike Guo, +Hui Xiong, +Qun Liu, and +[Zhenguo Li](https://scholar.google.com/citations?user=XboZC1AAAAAJ&hl=en) + +If you find this repository helpful, please consider citing: + +```bibtex +@article{, + title={Reasoning with Foundation Models: Concepts, Methodologies, and Outlook}, + author={}, + journal={arXiv preprint arXiv:}, + year={2023} +} +``` + + +## 1 Relevant Surveys and Links + +- The Rise and Potential of Large Language Model Based Agents: A Survey +\- +[[arXiv](https://arxiv.org/abs/2309.07864)] +[[Link](https://github.com/WooooDyy/LLM-Agent-Paper-List)] + +- Multimodal Foundation Models: From Specialists to General-Purpose Assistants +\- +[[arXiv](https://arxiv.org/abs/2309.10020)] + +- A Survey on Multimodal Large Language Models +\- +[[arXiv](https://arxiv.org/abs/2306.13549)] +[[Link](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models)] + +- Interactive Natural Language Processing +\- +[[arXiv](https://arxiv.org/abs/2305.13246)] +[[Link](https://github.com/InteractiveNLP-Team/awesome-InteractiveNLP-papers)] + +- A Survey of Large Language Models +\- +[[arXiv](https://arxiv.org/abs/2303.18223)] +[[Link](https://github.com/RUCAIBox/LLMSurvey)] + +- Self-Supervised Multimodal Learning: A Survey +\- +[[arXiv](https://arxiv.org/abs/2304.01008)] +[[Link](https://github.com/ys-zong/awesome-self-supervised-multimodal-learning)] + +- Large AI Models in Health Informatics: Applications, Challenges, and the Future +\- +[[arXiv](https://arxiv.org/abs/2303.11568)] +[[Paper](https://ieeexplore.ieee.org/document/10261199)] +[[Link](https://github.com/Jianing-Qiu/Awesome-Healthcare-Foundation-Models)] + +- Towards Reasoning in Large Language Models: A Survey +\- +[[arXiv](https://arxiv.org/abs/2212.10403)] +[[Paper](https://aclanthology.org/2023.findings-acl.67.pdf)] +[[Link](https://github.com/jeffhj/LM-reasoning)] + +- Reasoning with Language Model Prompting: A Survey +\- +[[arXiv](https://arxiv.org/abs/2212.09597)] +[[Paper](https://aclanthology.org/2023.acl-long.294.pdf)] +[[Link](https://github.com/zjunlp/Prompt4ReasoningPapers)] + +- Awesome Multimodal Reasoning +\- +[[Link](https://github.com/atfortes/Awesome-Multimodal-Reasoning)] + + +## 2 Foundation Models + +![foundation_models](assets/22_foundation_models.jpg) + +### 2.1 Language Foundation Models + +- `2023/07` | `Llama 2` | [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288) +\- +[[Paper](https://arxiv.org/pdf/2307.09288.pdf)] +[[Code](https://github.com/facebookresearch/llama)] +[[Blog](https://ai.meta.com/llama/)] + +- `2023/05` | `PaLM 2` | [PaLM 2 Technical Report](https://arxiv.org/abs/2305.10403) +\- + +- `2023/03` | `PanGu-Σ` | [PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing](https://arxiv.org/pdf/2303.10845.pdf) +\- +[[Paper](https://arxiv.org/abs/2303.10845)] + +- `2023/03` | `GPT-4` | [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774) +\- +[[Paper](https://arxiv.org/pdf/2303.08774.pdf)] +[[Blog](https://openai.com/research/gpt-4)] + +- `2023/02` | `LLaMA` | [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) +\- +[[Paper](https://arxiv.org/pdf/2302.13971.pdf)] +[[Code](https://github.com/facebookresearch/llama)] +[[Blog](https://ai.meta.com/blog/large-language-model-llama-meta-ai/)] + +- `2022/11` | `ChatGPT` | Chatgpt: Optimizing language models for dialogue +\- +[[Blog](https://openai.com/blog/chatgpt)] + +- `2022/04` | `PaLM` | [PaLM: Scaling Language Modeling with Pathways](https://arxiv.org/abs/2204.02311) +\- +[[Paper](https://arxiv.org/pdf/2204.02311.pdf)] +[[Blog](https://blog.research.google/2022/04/pathways-language-model-palm-scaling-to.html)] + +- `2021/09` | `FLAN` | [Finetuned Language Models Are Zero-Shot Learners](https://arxiv.org/abs/2109.01652) +\- + +- `2021/07` | `Codex` | [Evaluating Large Language Models Trained on Code](https://arxiv.org/abs/2107.03374) +\- + +- `2021/05` | `GPT-3` | [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) +\- +[[Paper](https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf)] +[[Code](https://github.com/openai/gpt-3)] + +- `2021/04` | `PanGu-α` | [PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation](https://arxiv.org/abs/2104.12369) +\- +[[Paper](https://arxiv.org/pdf/2104.12369.pdf)] +[[Code](https://github.com/huawei-noah/Pretrained-Language-Model)] + +- `2019/08` | `Sentence-BERT` | [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks +](https://arxiv.org/abs/1908.10084) +\- + +- `2019/07` | `RoBERTa` | [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) +\- + +- `2018/10` | `BERT` | [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) +\- +[[Paper](https://aclanthology.org/N19-1423.pdf)] +[[Code](https://github.com/google-research/bert)] +[[Blog](https://blog.research.google/2018/11/open-sourcing-bert-state-of-art-pre.html)] + + +### 2.2 Vision Foundation Models + +- `2023/05` | `SAA+` | [Segment Any Anomaly without Training via Hybrid Prompt Regularization](https://arxiv.org/abs/2305.10724) +\- +[[Paper](https://arxiv.org/pdf/2305.10724.pdf)] +[[Code](https://github.com/caoyunkang/Segment-Any-Anomaly)] + +- `2023/05` | `Explain Any Concept` | [Explain Any Concept: Segment Anything Meets Concept-Based Explanation](https://arxiv.org/abs/2305.10289) +\- +[[Paper](https://arxiv.org/pdf/2305.10289.pdf)] +[[Code](https://github.com/Jerry00917/samshap)] + +- `2023/05` | `SAM-Track` | [Segment and Track Anything](https://arxiv.org/abs/2305.06558) +\- +[[Paper](https://arxiv.org/pdf/2305.06558.pdf)] +[[Code](https://github.com/z-x-yang/Segment-and-Track-Anything)] + +- `2023/05` | `SAMRS` | [SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model](https://arxiv.org/abs/2305.02034) +\- +[[Paper](https://arxiv.org/pdf/2305.02034.pdf)] +[[Code](https://github.com/ViTAE-Transformer/SAMRS)] + +- `2023/04` | `Edit Everything` | [Edit Everything: A Text-Guided Generative System for Images Editing](https://arxiv.org/abs/2304.14006) +\- +[[Paper](https://arxiv.org/pdf/2304.14006.pdf)] +[[Code](https://github.com/DefengXie/Edit_Everything)] + +- `2023/04` | `Inpaint Anything` | [Inpaint Anything: Segment Anything Meets Image Inpainting](https://arxiv.org/abs/2304.06790) +\- +[[Paper](https://arxiv.org/pdf/2304.06790.pdf)] +[[Code](https://github.com/geekyutao/Inpaint-Anything)] + +- `2023/04` | `SAM` | [Segment Anything](https://arxiv.org/abs/2304.02643) +\- +[[Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Kirillov_Segment_Anything_ICCV_2023_paper.pdf)] +[[Code](https://github.com/facebookresearch/segment-anything)] +[[Blog](https://segment-anything.com/)] + +- `2023/03` | `VideoMAE V2` | [VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking](https://arxiv.org/abs/2303.16727) +\- +[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_VideoMAE_V2_Scaling_Video_Masked_Autoencoders_With_Dual_Masking_CVPR_2023_paper.pdf)] +[[Code](https://github.com/OpenGVLab/VideoMAEv2)] + +- `2023/03` | `Grounding DINO` | [Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection](https://arxiv.org/abs/2303.05499) +\- +[[Paper](https://arxiv.org/pdf/2303.05499.pdf)] +[[Code](https://github.com/IDEA-Research/GroundingDINO)] + +- `2022/03` | `VideoMAE` | [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) +\- +[[Paper](https://proceedings.neurips.cc/paper_files/paper/2022/hash/416f9cb3276121c42eebb86352a4354a-Abstract-Conference.html)] +[[Code](https://github.com/MCG-NJU/VideoMAE)] + +- `2021/12` | `Stable Diffusion` | [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) +\- +[[Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf)] +[[Code](https://github.com/CompVis/latent-diffusion)] + +- `2021/09` | `LaMa` | [Resolution-robust Large Mask Inpainting with Fourier Convolutions](https://arxiv.org/abs/2109.07161) +\- +[[Paper](https://openaccess.thecvf.com/content/WACV2022/papers/Suvorov_Resolution-Robust_Large_Mask_Inpainting_With_Fourier_Convolutions_WACV_2022_paper.pdf)] +[[Code](https://github.com/advimman/lama)] + +- `2021/03` | `Swin` | [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) +\- +[[Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Liu_Swin_Transformer_Hierarchical_Vision_Transformer_Using_Shifted_Windows_ICCV_2021_paper.pdf)] +[[Code](https://github.com/microsoft/Swin-Transformer)] + +- `2020/10` | `ViT` | [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) +\- +[[Paper](https://openreview.net/pdf?id=YicbFdNTTy)] + + +### 2.3 Multimodal Foundation Models + +- `2023/05` | `Caption Anything` | [Caption Anything: Interactive Image Description with Diverse Multimodal Controls](https://arxiv.org/abs/2305.02677) +\- +[[Paper](https://arxiv.org/pdf/2305.02677.pdf)] +[[Code](https://github.com/ttengwang/Caption-Anything)] + +- `2023/05` | `SAMText` | [Scalable Mask Annotation for Video Text Spotting](https://arxiv.org/abs/2305.01443) +\- +[[Paper](https://arxiv.org/pdf/2305.01443.pdf)] +[[Code](https://github.com/ViTAE-Transformer/SAMText)] + +- `2023/04` | `Text2Seg` | [Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models](https://arxiv.org/abs/2304.10597) +\- +[[Paper](https://arxiv.org/pdf/2304.10597.pdf)] + +- `2023/04` | `MiniGPT-4` | [MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models](https://arxiv.org/abs/2304.10592) +\- + +- `2023/04` | `LLaVA` | [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) +\- + +- `2023/04` | `CLIP Surgery` | [CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks](https://arxiv.org/abs/2304.05653) +\- +[[Paper](https://arxiv.org/pdf/2304.05653.pdf)] +[[Code](https://github.com/xmed-lab/CLIP_Surgery)] + +- `2023/03` | `UniDiffuser` | [One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale](https://arxiv.org/abs/2303.06555) +\- + +- `2023/01` | `GALIP` | [GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis](https://arxiv.org/abs/2301.12959) +\- +[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Tao_GALIP_Generative_Adversarial_CLIPs_for_Text-to-Image_Synthesis_CVPR_2023_paper.pdf)] +[[Code](https://github.com/tobran/GALIP)] + +- `2022/12` | `Img2Prompt` | [From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models](https://arxiv.org/abs/2212.10846) +\- + +- `2022/01` | `BLIP` | [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) +\- + +- `2021/09` | `CoOp` | [Learning to Prompt for Vision-Language Models](https://arxiv.org/abs/2109.01134) +\- +[[Paper](https://link.springer.com/article/10.1007/s11263-022-01653-1)] +[[Code](https://github.com/KaiyangZhou/CoOp)] + +- `2021/02` | `CLIP` | [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) +\- +[[Paper](https://proceedings.mlr.press/v139/radford21a/radford21a.pdf)] +[[Code](https://github.com/openai/CLIP)] +[[Blog](https://openai.com/research/clip)] + + +### 2.4 Reasoning Applications + +- `2022/06` | `Minerva` | [Solving Quantitative Reasoning Problems with Language Models](https://arxiv.org/abs/2206.14858) +\- +[[Paper](https://openreview.net/pdf?id=IFXTZERXdM7)] +[[Blog](https://blog.research.google/2022/06/minerva-solving-quantitative-reasoning.html)] + +- `2022/06` | `BIG-bench` | [Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models](https://arxiv.org/abs/2206.04615) +\- +[[Paper](https://openreview.net/pdf?id=uyTL5Bvosj)] +[[Code](https://github.com/google/BIG-bench)] + +- `2022/05` | `Zero-shot-CoT` | [Large Language Models are Zero-Shot Reasoners](https://arxiv.org/abs/2205.11916) +\- +[[Paper](https://openreview.net/pdf?id=e2TBb5y0yFf)] +[[Code](https://github.com/kojima-takeshi188/zero_shot_cot)] + +- `2022/03` | `STaR` | [STaR: Bootstrapping Reasoning With Reasoning](https://arxiv.org/abs/2203.14465) +\- +[[Paper](https://openreview.net/pdf?id=_3ELRdg2sgI)] +[[Code](https://github.com/ezelikman/STaR)] + +- `2021/07` | `MWP-BERT` | [MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem Solving](https://arxiv.org/abs/2107.13435) +\- +[[Paper](https://aclanthology.org/2022.findings-naacl.74.pdf)] +[[Code](https://github.com/LZhenwen/MWP-BERT)] + +- `2017/05` | `AQUA-RAT` | [Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems](https://arxiv.org/abs/1705.04146) +\- +[[Paper](https://aclanthology.org/P17-1015.pdf)] +[[Code](https://github.com/google-deepmind/AQuA)] + +## 3 Reasoning Tasks + + +### 3.1 Commonsense Reasoning + +- `2023/05` | `LLM-MCTS` | [Large Language Models as Commonsense Knowledge for Large-Scale Task Planning](https://arxiv.org/abs/2305.14078) +\- +[[Paper](https://openreview.net/pdf?id=Wjp1AYB8lH)] +[[Code](https://github.com/1989Ryan/llm-mcts)] +[[Project](https://llm-mcts.github.io)] + +- `2023/05` | Bridging the Gap between Pre-Training and Fine-Tuning for Commonsense Generation +\- +[[Paper](https://aclanthology.org/2023.findings-eacl.28.pdf)] +[[Code](https://github.com/LHRYANG/CommonGen)] + +- `2022/11` | `DANCE` | [Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles](https://arxiv.org/abs/2211.16504) +\- +[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Ye_Improving_Commonsense_in_Vision-Language_Models_via_Knowledge_Graph_Riddles_CVPR_2023_paper.pdf)] +[[Code](https://github.com/pleaseconnectwifi/DANCE)] +[[Project](https://shuquanye.com/DANCE_website)] + +- `2022/10` | `CoCoGen` | [Language Models of Code are Few-Shot Commonsense Learners](https://arxiv.org/abs/2210.07128) +\- +[[Paper](https://aclanthology.org/2022.emnlp-main.90.pdf)] +[[Code](https://github.com/reasoning-machines/CoCoGen)] + +- `2021/10` | [A Systematic Investigation of Commonsense Knowledge in Large Language Models](https://arxiv.org/abs/2111.00607) +\- +[[Paper](https://aclanthology.org/2022.emnlp-main.812.pdf)] + + +- `2021/05` | [Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social Commonsense](https://arxiv.org/abs/2105.05913) +\- +[[Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9383453)] + +#### 3.1.1 Commonsense Question and Answering (QA) + +- `2019/06` | `CoS-E` | [Explain Yourself! Leveraging Language Models for Commonsense Reasoning](https://arxiv.org/abs/1906.02361) +\- +[[Paper](https://aclanthology.org/P19-1487.pdf)] +[[Code](https://github.com/salesforce/cos-e)] + +- `2018/11` | `CQA` | [CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge](https://arxiv.org/abs/1811.00937) +\- +[[Paper](https://aclanthology.org/N19-1421.pdf)] +[[Code](https://github.com/jonathanherzig/commonsenseqa)] +[[Project](https://www.tau-nlp.sites.tau.ac.il/commonsenseqa)] + +- `2016/12` | `ConceptNet` | [ConceptNet 5.5: An Open Multilingual Graph of General Knowledge](https://arxiv.org/abs/1612.03975) +\- +[[Paper](https://ojs.aaai.org/index.php/AAAI/article/view/11164)] +[[Project](https://conceptnet.io)] + +#### 3.1.2 Physical Commonsense Reasoning + +- `2023/10` | `NEWTON` | [NEWTON: Are Large Language Models Capable of Physical Reasoning?](https://arxiv.org/abs/2310.07018) +\- +[[Paper](https://arxiv.org/pdf/2310.07018.pdf)] +[[Code](https://github.com/NewtonReasoning/Newton)] +[[Project](https://newtonreasoning.github.io)] + +- `2022/03` | `PACS` | [PACS: A Dataset for Physical Audiovisual CommonSense Reasoning](https://arxiv.org/abs/2203.11130) +\- +[[Paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136970286.pdf)] +[[Code](https://github.com/samuelyu2002/PACS)] + +- `2021/10` | `VRDP` | [Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language](https://arxiv.org/abs/2110.15358) +\- +[[Paper](https://openreview.net/pdf?id=lk1ORT35tbi)] +[[Code](https://github.com/dingmyu/VRDP)] + +- `2020/05` | `ESPRIT` | [ESPRIT: Explaining Solutions to Physical Reasoning Tasks](https://arxiv.org/abs/2005.00730) +\- +[[Paper](https://aclanthology.org/2020.acl-main.706.pdf)] +[[Code](https://github.com/salesforce/esprit)] + +- `2019/11` | `PIQA` | [PIQA: Reasoning about Physical Commonsense in Natural Language](https://arxiv.org/abs/1911.11641) +\- +[[Paper](https://ojs.aaai.org/index.php/AAAI/article/view/6239)] +[[Project](https://leaderboard.allenai.org/physicaliqa/submissions/public)] + +#### 3.1.3 Spatial Commonsense Reasoning + +- `2022/03` | [Things not Written in Text: Exploring Spatial Commonsense from Visual Signals](https://arxiv.org/abs/2203.08075) +\- +[[Paper](https://aclanthology.org/2022.acl-long.168.pdf)] +[[Code](https://github.com/xxxiaol/spatial-commonsense)] + +- `2021/06` | `PROST` | [PROST: Physical Reasoning of Objects through Space and Time](https://arxiv.org/abs/2106.03634) +\- +[[Paper](https://aclanthology.org/2021.findings-acl.404.pdf)] +[[Code](https://github.com/nala-cub/prost)] + +- `2019/02` | `GQA` | [GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering](https://arxiv.org/abs/1902.09506) +\- +[[Paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.pdf)] +[[Project](https://cs.stanford.edu/people/dorarad/gqa/index.html)] + +#### Benchmarks, Datasets, and Metrics + +- `2023/06` | `CConS` | [Probing Physical Reasoning with Counter-Commonsense Context](https://arxiv.org/abs/2306.02258) +\- + +- `2023/05` | `SummEdits` | [LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond](https://arxiv.org/abs/2305.14540) +\- +[[Paper](https://arxiv.org/pdf/2305.14540.pdf)] +[[Code](https://github.com/salesforce/factualNLG)] + +- `2021/03` | `RAINBOW` | [UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark](https://arxiv.org/abs/2103.13009) +\- + +- `2020/11` | `ProtoQA` | ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning +\- +[[Paper](https://aclanthology.org/2020.emnlp-main.85.pdf)] + +- `2020/10` | `DrFact` | [Differentiable Open-Ended Commonsense Reasoning](https://arxiv.org/abs/2010.14439) + +- `2019/11` | `CommonGen` | [CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning](https://arxiv.org/abs/1911.03705) + +- `2019/08` | `Cosmos QA` | [Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning](https://arxiv.org/abs/1909.00277) + +- `2019/08` | `αNLI` | [Abductive Commonsense Reasoning](https://arxiv.org/abs/1908.05739) +\- + +- `2019/08` | `PHYRE` | [PHYRE: A New Benchmark for Physical Reasoning](https://arxiv.org/abs/1908.05656) +\- + +- `2019/07` | `WinoGrande` | [WinoGrande: An Adversarial Winograd Schema Challenge at Scale](https://arxiv.org/abs/1907.10641) +\- + +- `2019/05` | `MathQA` | [MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms](https://arxiv.org/abs/1905.13319) +\- + +- `2019/05` | `HellaSwag` | [HellaSwag: Can a Machine Really Finish Your Sentence?](https://arxiv.org/abs/1905.07830) +\- + +- `2019/04` | `Social IQa` | [SocialIQA: Commonsense Reasoning about Social Interactions](https://arxiv.org/abs/1904.09728) +\- +[[Paper](https://aclanthology.org/D19-1454.pdf)] + +- `2018/08` | `SWAG` | [SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference](https://arxiv.org/abs/1808.05326) +\- + +- `2002/07` | `BLEU` | BLEU: a Method for Automatic Evaluation of Machine Translation +\- +[[Paper](https://aclanthology.org/P02-1040.pdf)] + + +### 3.2 Mathematical Reasoning + +- `2022/11` | Tokenization in the Theory of Knowledge +\- +[[Paper](https://www.mdpi.com/2673-8392/3/1/24)] + +- `2022/06` | `MultiHiertt` | [MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data](https://arxiv.org/abs/2206.01347) + +- `2021/04` | `MultiModalQA` | [MultiModalQA: Complex Question Answering over Text, Tables and Images](https://arxiv.org/abs/2104.06039) + +- `2017/05` | [Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems](https://arxiv.org/abs/1705.04146) + +- `2014/04` | [Deep Learning in Neural Networks: An Overview](https://arxiv.org/abs/1404.7828) +\- +[[Paper](https://www.sciencedirect.com/science/article/pii/S0893608014002135)] + +- `2004` | Wittgenstein on philosophy of logic and mathematics +\- +[[Paper](https://www.pdcnet.org/gfpj/content/gfpj_2004_0025_0002_0227_0288)] + +- `1989` | `CLP` | Connectionist Learning Procedures +\- +[[Paper](https://www.sciencedirect.com/science/article/pii/0004370289900490)] + +#### 3.2.1 Arithmetic Reasoning + +- `2022/09` | `PromptPG` | [Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning](https://arxiv.org/abs/2209.14610) + +- `2022/01` | [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) +\- + +- `2021/03` | `SVAMP` | [Are NLP Models really able to Solve Simple Math Word Problems?](https://arxiv.org/abs/2103.07191) +\- +[[Paper](https://aclanthology.org/2021.naacl-main.168.pdf)] +[[Code](https://github.com/arkilpatel/SVAMP)] + +- `2021/03` | `MATH` | [Measuring Mathematical Problem Solving With the MATH Dataset](https://arxiv.org/abs/2103.03874) +\- + +- `2016/08` | [How well do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation](https://aclanthology.org/P16-1084/) +\- +[[Paper](https://aclanthology.org/P16-1084.pdf)] + +- `2015/09` | [Learn to Solve Algebra Word Problems Using Quadratic Programming](https://aclanthology.org/D15-1096/) +\- +[[Paper](https://aclanthology.org/D15-1096.pdf)] + +- `2014/06` | `Alg514` | [Learning to Automatically Solve Algebra Word Problems](https://aclanthology.org/P14-1026/) +\- +[[Paper](https://aclanthology.org/P14-1026.pdf)] + +#### 3.2.2 Geometry Reasoning + +- `2022/12` | `UniGeo` / `Geoformer` | [UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression](https://arxiv.org/abs/2212.02746) + +- `2021/05` | `GeoQA` / `NGS` | [GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning](https://arxiv.org/abs/2105.14517) + +- `2021/05` | `Geometry3K` / `Inter-GPS` | [Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning](https://arxiv.org/abs/2105.04165) + +- `2015/09` | `GeoS` | [Solving Geometry Problems: Combining Text and Diagram Interpretation](https://aclanthology.org/D15-1171/) +\- +[[Paper](https://aclanthology.org/D15-1171.pdf)] + +#### 3.2.3 Theorem Proving + +- `2020/10` | `Prover` | [LEGO-Prover: Neural Theorem Proving with Growing Libraries](https://arxiv.org/abs/2310.00656) +\- + +- `2023/09` | `Lyra` | [Lyra: Orchestrating Dual Correction in Automated Theorem Proving](https://arxiv.org/abs/2309.15806) + +- `2023/06` | `DT-Solver` | [DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-level Value Function](https://aclanthology.org/2023.acl-long.706/) +\- +[[Paper](https://aclanthology.org/2023.acl-long.706.pdf)] + +- `2023/05` | [Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving](https://arxiv.org/abs/2305.16366) + +- `2023/03` | `Magnushammer` | [Magnushammer: A Transformer-based Approach to Premise Selection](https://arxiv.org/abs/2303.04488) + +- `2022/10` | `DSP` | [Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs](https://arxiv.org/abs/2210.12283) +\- + +- `2022/05` | [Learning to Find Proofs and Theorems by Learning to Refine Search Strategies: The Case of Loop Invariant Synthesis](https://arxiv.org/abs/2205.14229) + +- `2022/05` | [Autoformalization with Large Language Models](https://arxiv.org/abs/2205.12615) +\- +[[Paper](https://proceedings.neurips.cc/paper_files/paper/2022/file/d0c6bc641a56bebee9d985b937307367-Paper-Conference.pdf)] + +- `2022/05` | `HTPS` | [HyperTree Proof Search for Neural Theorem Proving](https://arxiv.org/abs/2205.11491) + +- `2022/05` | `Thor` | [Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers](https://arxiv.org/abs/2205.10893) +\- + +- `2022/02` | [Formal Mathematics Statement Curriculum Learning](https://arxiv.org/abs/2202.01344) +\- + +- `2021/07` | `Lean 4` | [The Lean 4 Theorem Prover and Programming Language](https://link.springer.com/chapter/10.1007/978-3-030-79876-5_37) +\- + +- `2021/02` | `TacticZero` | [TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning](https://arxiv.org/abs/2102.09756) +\- + +- `2021/02` | `PACT` | [Proof Artifact Co-training for Theorem Proving with Language Models](https://arxiv.org/abs/2102.06203) +\- + +- `2020/09` | `GPT-f` |[Generative Language Modeling for Automated Theorem Proving](https://arxiv.org/abs/2009.03393) +\- + +- `2019/07` | [Formal Verification of Hardware Components in Critical Systems](https://www.hindawi.com/journals/wcmc/2020/7346763/) +\- +[[Paper](https://downloads.hindawi.com/journals/wcmc/2020/7346763.pdf?_gl=1*1yjtq1u*_ga*MjA3MTczMzQzOC4xNjk5NjE3NDI1*_ga_NF5QFMJT5V*MTY5OTYxNzQyNC4xLjEuMTY5OTYxNzQ2Ni4xOC4wLjA.&_ga=2.180805351.1310949615.1699617425-2071733438.1699617425)] + +- `2019/06` | `Metamath` | A Computer Language for Mathematical Proofs +\- +[[Paper](http://de.metamath.org/downloads/metamath.pdf)] + +- `2019/05` | `CoqGym` | [Learning to Prove Theorems via Interacting with Proof Assistants](https://arxiv.org/abs/1905.09381) + +- `2018/12` | `AlphaZero` | [A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://www.science.org/doi/10.1126/science.aar6404) +\- +[[Paper](https://www.science.org/doi/pdf/10.1126/science.aar6404)] + +- `2018/04` | `TacticToe` | [TacticToe: Learning to Prove with Tactics](https://arxiv.org/abs/1804.00596) + +- `2015/08` | `Lean` | The Lean Theorem Prover (system description) +\- +[[Paper](https://lean-lang.org/papers/system.pdf)] + +- `2010/07` | Three Years of Experience with Sledgehammer, a Practical Link between Automatic and Interactive Theorem Provers +\- +[[Paper](https://www.cl.cam.ac.uk/~lp15/papers/Automation/paar.pdf)] + +- `2010/04` | Formal Methods at Intel - An Overview +\- +[[Slides](https://shemesh.larc.nasa.gov/NFM2010/talks/harrison.pdf)] + +- `2005/07` | Combining Simulation and Formal Verification for Integrated Circuit Design Validation +\- +[[Paper](https://s2.smu.edu/~mitch/ftp_dir/pubs/wmsci05.pdf)] + +- `2003` | Extracting a Formally Verified, Fully Executable +Compiler from a Proof Assistant +\- +[[Paper](https://www.sciencedirect.com/science/article/pii/S1571066105825988/pdf?md5=10b884badea7fe0e46c38b9419fbcca6&pid=1-s2.0-S1571066105825988-main.pdf&_valck=1)] + +- `1996` | `Coq` | The Coq Proof Assistant-Reference Manual +\- +[[Project](https://coq.inria.fr/documentation)] + +- `1994` | `Isabelle` | Isabelle: A Generic Theorem Prover +\- +[[Paper](https://link.springer.com/content/pdf/10.1007/bfb0030558.pdf)] + +#### 3.2.4 Scientific Reasoning + +- `2023/07` | `SciBench` | [SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models](https://arxiv.org/abs/2307.10635) +\- + +- `2022/09` | `ScienceQA` | [Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering](https://arxiv.org/abs/2209.09513) + +- `2022/03` | `ScienceWorld` | [ScienceWorld: Is your Agent Smarter than a 5th Grader?](https://arxiv.org/abs/2203.07540) + +- `2012` | Current Topics in Children's Learning and Cognition +\- +[[Book](https://www.intechopen.com/books/654)] + +#### Benchmarks, Datasets, and Metrics + +- `2023/08` | `Math23K-F` / `MAWPS-F` / `FOMAS` | [Guiding Mathematical Reasoning via Mastering Commonsense Formula Knowledge](https://dl.acm.org/doi/abs/10.1145/3580305.3599375) +\- +[[Paper](https://dl.acm.org/doi/pdf/10.1145/3580305.3599375)] + +- `2023/07` | `ARB` | [ARB: Advanced Reasoning Benchmark for Large Language Models](https://arxiv.org/abs/2307.13692) +\- + +- `2023/05` | `SwiftSage` | [SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks](https://arxiv.org/abs/2305.17390) +\- + +- `2023/05` | `TheoremQA` | [TheoremQA: A Theorem-driven Question Answering dataset](https://arxiv.org/abs/2305.12524) +\- + +- `2022/10` | `MGSM` | [Language Models are Multilingual Chain-of-Thought Reasoners](https://arxiv.org/abs/2210.03057) +\- +[[Paper](https://openreview.net/pdf?id=fR3wGCk-IXp)] +[[Code](https://github.com/google-research/url-nlp)] + +- `2021/10` | `GSM8K` | [Training Verifiers to Solve Math Word Problems](https://arxiv.org/abs/2110.14168) +\- +[[Paper](https://arxiv.org/pdf/2110.14168.pdf)] +[[Code](https://github.com/openai/grade-school-math)] +[[Blog](https://openai.com/research/solving-math-word-problems)] + +- `2021/10` | `IconQA` | [IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning](https://arxiv.org/abs/2110.13214) +\- + +- `2021/09` | `FinQA` | [FinQA: A Dataset of Numerical Reasoning over Financial Data](https://arxiv.org/abs/2109.00122) +\- + +- `2021/08` | `MBPP` / `MathQA-Python` | [Program Synthesis with Large Language Models](https://arxiv.org/abs/2108.07732) + +- `2021/08` | `HiTab` / `EA` | [HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation](https://arxiv.org/abs/2108.06712) + +- `2021/07` | `HumanEval` / `Codex` | [Evaluating Large Language Models Trained on Code](https://arxiv.org/abs/2107.03374) +\- + +- `2021/06` | `ASDiv` / `CLD` | [A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers](https://arxiv.org/abs/2106.15772) +\- + +- `2021/06` | `AIT-QA` | [AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry](https://arxiv.org/abs/2106.12944) +\- + +- `2021/05` | `APPS` | [Measuring Coding Challenge Competence With APPS](https://arxiv.org/abs/2105.09938) +\- + +- `2021/05` | `TAT-QA` | [TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance](https://arxiv.org/abs/2105.07624) + +- `2021/03` | `SVAMP` | [Are NLP Models really able to Solve Simple Math Word Problems?](https://arxiv.org/abs/2103.07191) +\- + +- `2021/01` | `TSQA` / `MAP` / `MRR` | [TSQA: Tabular Scenario Based Question Answering](https://arxiv.org/abs/2101.11429) + +- `2020/10` | `HMWP` | [Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems](https://arxiv.org/abs/2010.06823) +\- + +- `2020/04` | `HybridQA` | [HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data](https://arxiv.org/abs/2004.07347) + +- `2019/03` | `DROP` | [DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs](https://arxiv.org/abs/1903.00161) +\- + +- `2019` | `NaturalQuestions` | [Natural Questions: A Benchmark for Question Answering Research](https://aclanthology.org/Q19-1026/) +\- +[[Paper](https://aclanthology.org/Q19-1026.pdf)] + +- `2018/09` | `HotpotQA` | [HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering](https://arxiv.org/abs/1809.09600) +\- + +- `2018/09` | `Spider` | [Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task](https://arxiv.org/abs/1809.08887) +\- + +- `2018/03` | `ComplexWebQuestions` | [The Web as a Knowledge-base for Answering Complex Questions](https://arxiv.org/abs/1803.06643) +\- + +- `2017/12` | `MetaQA` | [Variational Reasoning for Question Answering with Knowledge Graph](https://arxiv.org/abs/1709.04071) +\- + +- `2017/09` | `GEOS++` | [From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems](https://aclanthology.org/D17-1081/) +\- +[[Paper](https://aclanthology.org/D17-1081.pdf)] + +- `2017/09` | `Math23k` | [Deep Neural Solver for Math Word Problems](https://aclanthology.org/D17-1088/) +\- +[[Paper](https://aclanthology.org/D17-1088.pdf)] + +- `2017/08` | `WikiSQL` / `Seq2SQL` | [Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning](https://arxiv.org/abs/1709.00103) +\- + +- `2017/08` | [Learning to Solve Geometry Problems from Natural Language Demonstrations in Textbooks](https://aclanthology.org/S17-1029/) +\- +[[Paper](https://aclanthology.org/S17-1029.pdf)] + +- `2017/05` | `TriviaQA` | [TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension](https://arxiv.org/abs/1705.03551) +\- + +- `2017/05` | `GeoShader` | Synthesis of Solutions for Shaded Area Geometry Problems +\- +[[Paper](https://cdn.aaai.org/ocs/15416/15416-68619-1-PB.pdf)] + +- `2016/09` | `DRAW-1K` | [Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems](https://arxiv.org/abs/1609.07197) +\- + +- `2016/08` | `WebQSP` | [The Value of Semantic Parse Labeling for Knowledge Base Question Answering](https://aclanthology.org/P16-2033/) +\- +[[Paper](https://aclanthology.org/P16-2033.pdf)] + +- `2016/06` | `SQuAD` | [SQuAD: 100,000+ Questions for Machine Comprehension of Text](https://arxiv.org/abs/1606.05250) +\- + +- `2016/06` | `WikiMovies` | [Key-Value Memory Networks for Directly Reading Documents](https://arxiv.org/abs/1606.03126) +\- + +- `2016/06` | `MAWPS` | [MAWPS: A Math Word Problem Repository](https://aclanthology.org/N16-1136/) +\- +[[Paper](https://aclanthology.org/N16-1136.pdf)] + +- `2015/09` | `Dolphin1878` | [Automatically Solving Number Word Problems by Semantic Parsing and Reasoning](https://aclanthology.org/D15-1135/) +\- +[[Paper](https://aclanthology.org/D15-1135.pdf)] + +- `2015/08` | `WikiTableQA` | [Compositional Semantic Parsing on Semi-Structured Tables](https://arxiv.org/abs/1508.00305) +\- + +- `2015` | `SingleEQ` | [Parsing Algebraic Word Problems into Equations](https://aclanthology.org/Q15-1042/) +\- +[[Paper](https://aclanthology.org/Q15-1042.pdf)] + +- `2015` | `DRAW` | DRAW: A Challenging and Diverse Algebra Word Problem Set +\- +[[Paper](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tech_rep.pdf)] + +- `2014/10` | `Verb395` | [Learning to Solve Arithmetic Word Problems with Verb Categorization](https://aclanthology.org/D14-1058/) +\- +[[Paper](https://aclanthology.org/D14-1058.pdf)] + +- `2013/10` | `WebQuestions` | [Semantic Parsing on Freebase from Question-Answer Pairs](https://aclanthology.org/D13-1160/) +\- +[[Paper](https://aclanthology.org/D13-1160.pdf)] + +- `2013/08` | `Free917` | [Large-scale Semantic Parsing via Schema Matching and Lexicon Extension](https://aclanthology.org/P13-1042/) +\- +[[Paper](https://aclanthology.org/P13-1042.pdf)] + +- `2002/04` | `NMI` | [Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions](https://dl.acm.org/doi/10.1162/153244303321897735) +\- +[[Paper](https://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf)] + +- `1990` | `ATIS` | [The ATIS Spoken Language Systems Pilot Corpus](https://aclanthology.org/H90-1021/) +\- +[[Paper](https://aclanthology.org/H90-1021.pdf)] + + +### 3.3 Logical Reasoning + +- `2023/10` | `LogiGLUE` | [Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models](https://arxiv.org/abs/2310.00836) +\- + +- `2023/05` | `Logic-LM` | [Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning](https://arxiv.org/abs/2305.12295) +\- + +- `2023/03` | `LEAP` | [Explicit Planning Helps Language Models in Logical Reasoning](https://arxiv.org/abs/2303.15714) +\- + +- `2023/03` | [Sparks of Artificial General Intelligence: Early experiments with GPT-4](https://arxiv.org/abs/2303.12712) +\- + +- `2022/10` | `Entailer` | [Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning](https://arxiv.org/abs/2210.12217) +\- + +- `2022/06` | `NeSyL` | [Weakly Supervised Neural Symbolic Learning for Cognitive Tasks](https://ojs.aaai.org/index.php/AAAI/article/view/20533) +\- +[[Paper](https://ojs.aaai.org/index.php/AAAI/article/view/20533/20292)] + +- `2022/05` | `NeuPSL` | [NeuPSL: Neural Probabilistic Soft Logic](https://arxiv.org/abs/2205.14268) +\- + +- `2022/05` | `NLProofS` | [Generating Natural Language Proofs with Verifier-Guided Search](https://arxiv.org/abs/2205.12443) +\- + +- `2022/05` | `Least-to-Most Prompting` | [Least-to-Most Prompting Enables Complex Reasoning in Large Language Models](https://arxiv.org/abs/2205.10625) +\- + +- `2022/05` | `SI` | [Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning](https://arxiv.org/abs/2205.09712) +\- + +- `2022/03` | [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) +\- + +- `2021/11` | `NSPS` | [Neuro-Symbolic Program Search for Autonomous Driving Decision Module Design](https://proceedings.mlr.press/v155/sun21a.html) +\- +[[Paper](https://proceedings.mlr.press/v155/sun21a/sun21a.pdf)] + +- `2021/09` | `DeepProbLog` | [Neural probabilistic logic programming in DeepProbLog](https://www.sciencedirect.com/science/article/pii/S0004370221000552) +\- +[[Paper](https://www.sciencedirect.com/science/article/pii/S0004370221000552/pdfft?md5=1e6b82d50854f317478e487da9e75473&pid=1-s2.0-S0004370221000552-main.pdf)] + +- `2021/08` | `GABL` | [Abductive Learning with Ground Knowledge Base](https://www.ijcai.org/proceedings/2021/250) +\- +[[Paper](https://www.ijcai.org/proceedings/2021/0250.pdf)] + +- `2020/02` | `RuleTakers` | [Transformers as Soft Reasoners over Language](https://arxiv.org/abs/2002.05867) +\- + +- `2019/12` | `NMN-Drop` | [Neural Module Networks for Reasoning over Text](https://arxiv.org/abs/1912.04971) +\- + +- `2019/04` | `NS-CL` | [The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision](https://arxiv.org/abs/1904.12584) +\- + +- `2012` | Logical Reasoning and Learning +\- +[[Paper](https://link.springer.com/referenceworkentry/10.1007/978-1-4419-1428-6_790#:~:text=Logical%20reasoning%20is%20a%20form,of%20science%20and%20artificial%20intelligence.)] + +#### 3.3.1 Propositional Logic + +- `2022/09` | Propositional Reasoning via Neural Transformer +Language Models +\- +[[Paper](https://www.cs.cmu.edu/~oscarr/pdf/publications/2022_nesy.pdf)] + +#### 3.3.2 Predicate Logic + +- `2021/06` | `ILP` | [Inductive logic programming at 30](https://link.springer.com/article/10.1007/s10994-021-06089-1) +\- +[[Paper](https://link.springer.com/content/pdf/10.1007/s10994-021-06089-1.pdf)] + +- `2011` | Statistical Relational Learning +\- +[[Paper](https://link.springer.com/referenceworkentry/10.1007/978-0-387-30164-8_786)] + +#### Benchmarks, Datasets, and Metrics + +- `2022/10` | `PrOntoQA` | [Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought](https://arxiv.org/abs/2210.01240) +\- + +- `2022/09` | `FOLIO` | [FOLIO: Natural Language Reasoning with First-Order Logic](https://arxiv.org/abs/2209.00840) +\- + +- `2020/12` | `ProofWriter` [ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language](https://arxiv.org/abs/2012.13048) +\- + + +### 3.4 Causal Reasoning + +- `2023/08` | [Causal Parrots: Large Language Models May Talk Causality But Are Not Causal](https://arxiv.org/abs/2308.13067) + +- `2023/07` | [Causal Discovery with Language Models as Imperfect Experts](https://arxiv.org/abs/2307.02390) +\- + +- `2023/06` | [From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data](https://arxiv.org/abs/2306.16902) +\- + +- `2023/06` | `Corr2Cause` | [Can Large Language Models Infer Causation from Correlation?](https://arxiv.org/abs/2306.05836) +\- + +- `2023/05` | `Code-LLMs` | [The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code](https://arxiv.org/abs/2305.19213) +\- + +- `2023/04` | [Understanding Causality with Large Language Models: Feasibility and Opportunities](https://arxiv.org/abs/2304.05524) +\- + +- `2023/04` | [Causal Reasoning and Large Language Models: Opening a New Frontier for Causality](https://arxiv.org/abs/2305.00050) +\- + +- `2023/03` | [Can large language models build causal graphs?](https://arxiv.org/abs/2303.05279) +\- + +- `2023/01` | [Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis](https://arxiv.org/abs/2301.13819) +\- + +- `2022/09` | [Probing for Correlations of Causal Facts: Large Language Models and Causality](https://openreview.net/forum?id=UPwzqPOs4-) +\- +[[Paper](https://openreview.net/pdf?id=UPwzqPOs4-)] + +- `2022/07` | [Can Large Language Models Distinguish Cause from Effect?](https://openreview.net/forum?id=ucHh-ytUkOH&referrer=%5Bthe%20profile%20of%20Mrinmaya%20Sachan%5D(%2Fprofile%3Fid%3D~Mrinmaya_Sachan3)) +\- +[[Paper](https://openreview.net/pdf?id=ucHh-ytUkOH)] + +- `2021/08` | [Learning Faithful Representations of Causal Graphs](https://aclanthology.org/2021.acl-long.69/) +\- +[[Paper](https://aclanthology.org/2021.acl-long.69.pdf)] + +- `2021/05` | `InferBERT` | [InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance](https://www.frontiersin.org/articles/10.3389/frai.2021.659622/full) +\- +[[Paper](https://www.frontiersin.org/articles/10.3389/frai.2021.659622/pdf?isPublishedV2=False)] + +- `2021/02` | [Towards Causal Representation Learning](https://arxiv.org/abs/2102.11107) +\- + +- `2020/05` | `CausaLM` | [CausaLM: Causal Model Explanation Through Counterfactual Language Models](https://arxiv.org/abs/2005.13407) +\- + +- `2019/06` | [Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation](https://arxiv.org/abs/1906.01732) +\- + +- `2017` | [Elements of Causal Inference: Foundations and Learning Algorithms](https://mitpress.mit.edu/9780262037310/elements-of-causal-inference/) +\- +[[Book](https://library.oapen.org/bitstream/id/056a11be-ce3a-44b9-8987-a6c68fce8d9b/11283.pdf)] + +- `2016` | Actual Causality +\- +[[Book](https://direct.mit.edu/books/oa-monograph/3451/Actual-Causality)] + +- `2013` | Causal Reasoning +\- +[[Paper](https://psycnet.apa.org/record/2012-26298-046)] + +#### 3.4.1 Counterfactual Reasoning + +- `2023/07` | [Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks](https://arxiv.org/abs/2307.02477) +\- + +- `2023/05` | [Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios](https://arxiv.org/abs/2305.16572) +\- + +- `2007` | The Rational Imagination: How People Create Alternatives to Reality +\- +[[Paper](https://scholar.archive.org/work/zjwdgk7r6vefxaole362qftqji/access/wayback/http://www.tara.tcd.ie/bitstream/handle/2262/39428/Precis%20of%20The%20Rational%20Imagination%20-%20How%20People%20Create%20Alternatives%20to%20Reality.pdf?sequence=1)] + +- `1986` | Norm theory: Comparing reality to its alternatives +\- +[[Paper](https://psycnet.apa.org/record/1986-21899-001)] + +#### Benchmarks, Datasets, and Metrics + +- `2021/12` | `CRASS` | [CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models](https://arxiv.org/abs/2112.11941) +\- + +- `2021/08` | `Arctic sea ice` | [Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere](https://www.frontiersin.org/articles/10.3389/fdata.2021.642182/full) +\- +[[Paper](https://www.frontiersin.org/articles/10.3389/fdata.2021.642182/pdf?isPublishedV2=False)] + +- `2014/12` | `CauseEffectPairs` | [Distinguishing cause from effect using observational data: methods and benchmarks](https://arxiv.org/abs/1412.3773) +\- + + +### 3.5 Visual Reasoning + +- `2022/11` | `G-VUE` | [Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation](https://arxiv.org/abs/2211.15402) +\- + +- `2021/03` | `VLGrammar` | [VLGrammar: Grounded Grammar Induction of Vision and Language](https://arxiv.org/abs/2103.12975) +\- + +- `2020/12` | [Attention over learned object embeddings enables complex visual reasoning](https://arxiv.org/abs/2012.08508) +\- + +#### 3.5.1 3D Reasoning + +- `2023/08` | `PointLLM` | [PointLLM: Empowering Large Language Models to Understand Point Clouds](https://arxiv.org/abs/2308.16911) +\- + +- `2023/08` | `3D-VisTA` | [3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment](https://arxiv.org/abs/2308.04352) +\- + +- `2023/07` | `3D-LLM` | [3D-LLM: Injecting the 3D World into Large Language Models](https://arxiv.org/abs/2307.12981) +\- + +- `2022/10` | `SQA3D` | [SQA3D: Situated Question Answering in 3D Scenes](https://arxiv.org/abs/2210.07474) +\- + +#### Benchmarks, Datasets, and Metrics + +- `2021/12` | `PTR` | [PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning](https://arxiv.org/abs/2112.05136) +\- + +- `2019/05` | `OK-VQA` | [OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge](https://arxiv.org/abs/1906.00067) +\- + +- `2016/12` | `CLEVR` | [CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning](https://arxiv.org/abs/1612.06890) +\- + + +### 3.6 Audio Reasoning + +- `2022/05` | [Self-Supervised Speech Representation Learning: A Review](https://arxiv.org/abs/2205.10643) +\- + +#### 3.6.1 Speech + +- `2022/03` | `SUPERB-SG` | [SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities](https://arxiv.org/abs/2203.06849) +\- + +- `2022/02` | `Data2Vec` | [data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) +\- + +- `2021/10` | `WavLM` | [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) +\- + +- `2021/06` | `HuBERT` | [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) +\- + +- `2021/05` | `SUPERB` | [SUPERB: Speech processing Universal PERformance Benchmark](https://arxiv.org/abs/2105.01051) +\- + +- `2020/10` | `Speech SIMCLR` | [Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning](https://arxiv.org/abs/2010.13991) +\- + +- `2020/06` | `Wav2Vec 2.0` | [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) +\- + +- `2020/05` | `Conformer` | [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100) +\- + +- `2019/10` | `Mockingjay` | [Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders](https://arxiv.org/abs/1910.12638) +\- + +- `2019/04` | `APC` | [An Unsupervised Autoregressive Model for Speech Representation Learning](https://arxiv.org/abs/1904.03240) +\- + +- `2018/07` | `CPC` | [Representation Learning with Contrastive Predictive Coding](https://arxiv.org/abs/1807.03748) +\- + +- `2018/04` | `Speech-Transformer` | Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition +\- +[[Paper](https://ieeexplore.ieee.org/document/8462506)] + +- `2017/11` | `VQ-VAE` | [Neural Discrete Representation Learning](https://arxiv.org/abs/1711.00937) +\- + +- `2017/08` | [Large-Scale Domain Adaptation via Teacher-Student Learning](https://arxiv.org/abs/1708.05466) +\- + +#### Benchmarks, Datasets, and Metrics + +- `2022/03` | `SUPERB-SG` | [SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities](https://arxiv.org/abs/2203.06849) +\- + +- `2021/11` | `VoxPopuli` / `XLS-R` | [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) +\- + +- `2021/05` | `SUPERB` | [SUPERB: Speech processing Universal PERformance Benchmark](https://arxiv.org/abs/2105.01051) +\- + +- `2020/12` | `Multilingual LibriSpeech` | [MLS: A Large-Scale Multilingual Dataset for Speech Research](https://arxiv.org/abs/2012.03411) +\- + +- `2020/05` | `Didi Dictation` / `Didi Callcenter` | [A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition](https://arxiv.org/abs/2005.09862) +\- + +- `2019/12` | `Libri-Light` | [Libri-Light: A Benchmark for ASR with Limited or No Supervision](https://arxiv.org/abs/1912.07875) +\- + +- `2019/12` | `Common Voice` | [Common Voice: A Massively-Multilingual Speech Corpus](https://arxiv.org/abs/1912.06670) +\- + + +### 3.7 Multimodal Reasoning + +#### 3.7.1 Alignment + +- `2023/01` | `BLIP-2` | [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) +\- + +#### 3.7.2 Generation + +- `2023/10` | `DALL·E 3` | Improving Image Generation with Better Captions +\- +[[Paper](https://cdn.openai.com/papers/dall-e-3.pdf)] +[[Project](https://openai.com/dall-e-3)] + +- `2023/06` | `Kosmos-2` | [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) +\- + +- `2023/05` | `BiomedGPT` | [BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks](https://arxiv.org/abs/2305.17100) +\- + +- `2023/03` | `Visual ChatGPT` | [Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models](https://arxiv.org/abs/2303.04671) +\- + +- `2023/02` | `Kosmos-1` | [Language Is Not All You Need: Aligning Perception with Language Models](https://arxiv.org/abs/2302.14045) +\- + +- `2022/07` | `Midjourney` +\- +[[Project](https://www.midjourney.com/home)] + +- `2022/04` | `Flamingo` | [Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198) +\- + +- `2021/12` | `MAGMA` | [MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning](https://arxiv.org/abs/2112.05253) +\- + +#### 3.7.3 Multimodal Understanding + +- `2023/09` | `Q-Bench` | [Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision](https://arxiv.org/abs/2309.14181) +\- + +- `2023/05` | `DetGPT` | [DetGPT: Detect What You Need via Reasoning](https://arxiv.org/abs/2305.14167) +\- + +- `2023/03` | `Vicuna` | Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality +\- +[[Blog](https://lmsys.org/blog/2023-03-30-vicuna/)] + +- `2022/12` | `DePlot` | [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505) +\- + +- `2022/12` | `MatCha` | [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662) +\- + +#### Benchmarks, Datasets, and Metrics + +- `2023/06` | `LVLM-eHub` | [LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models](https://arxiv.org/abs/2306.09265) +\- + +- `2023/06` | `LAMM` | [LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark](https://arxiv.org/abs/2306.06687) +\- + +- `2023/05` | `AttackVLM` | [On Evaluating Adversarial Robustness of Large Vision-Language Models](https://arxiv.org/abs/2305.16934) +\- + +- `2023/05` | `POPE` | [Evaluating Object Hallucination in Large Vision-Language Models](https://arxiv.org/abs/2305.10355) +\- + +- `2023/05` | `MultimodalOCR` | [On the Hidden Mystery of OCR in Large Multimodal Models](https://arxiv.org/abs/2305.07895) +\- + +- `2022/10` | `ObjMLM` | [Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training](https://arxiv.org/abs/2210.07688) + +- `2022/06` | `RAVEN` / `ARC` | [Evaluating Understanding on Conceptual Abstraction Benchmarks](https://arxiv.org/abs/2206.14187) +\- + +- `2021/06` | `LARC` | [Communicating Natural Programs to Humans and Machines](https://arxiv.org/abs/2106.07824) +\- + +- `2014/11` | `CIDEr` / `PASCAL-50S` / `ABSTRACT-50S` | [CIDEr: Consensus-based Image Description Evaluation](https://arxiv.org/abs/1411.5726) +\- + + +### 3.8 Embodied Reasoning + +- `2023/11` | `OpenFlamingo` | [Vision-Language Foundation Models as Effective Robot Imitators](https://arxiv.org/abs/2311.01378) +\- + +- `2023/07` | `RT-2` | [RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control](https://arxiv.org/abs/2307.15818) +\- + +- `2023/05` | `RAP` | [Reasoning with Language Model is Planning with World Model](https://arxiv.org/abs/2305.14992) +\- + +- `2022/12` | `RT-1` | [RT-1: Robotics Transformer for Real-World Control at Scale](https://arxiv.org/abs/2212.06817) + +- `2022/10` | [Skill Induction and Planning with Latent Language](https://arxiv.org/abs/2110.01517) +\- + +- `2022/05` | `Gato` | [A Generalist Agent](https://arxiv.org/abs/2205.06175) +\- + +- `2022/04` | `SMs` | [Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language](https://arxiv.org/abs/2204.00598) +\- + +- `2022/02` | | [Pre-Trained Language Models for Interactive Decision-Making](https://arxiv.org/abs/2202.01771) +\- + +- `2022/01` | `Language-Planner` | [Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents](https://arxiv.org/abs/2201.07207) +\- + +- `2021/11` | [Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning](https://arxiv.org/abs/2111.03189) +\- + +- `2020/09` | [Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions](https://arxiv.org/abs/2009.14259) +\- + +- `2016/01` | `AlphaGo` | Mastering the game of Go with deep neural networks and tree search +\- +[[Paper](https://www.nature.com/articles/nature16961)] + +- `2014/05` | Gesture in reasoning: An embodied perspective +\- +[[Paper](https://www.taylorfrancis.com/chapters/edit/10.4324/9781315775845-19/gesture-reasoning-martha-alibali-rebecca-boncoddo-autumn-hostetter)] + +#### 3.8.1 Introspective Reasoning + +- `2022/11` | `PAL` | [PAL: Program-aided Language Models](https://arxiv.org/abs/2211.10435) +\- + +- `2022/09` | `ProgPrompt` | [ProgPrompt: Generating Situated Robot Task Plans using Large Language Models](https://arxiv.org/abs/2209.11302) +\- + +- `2022/09` | `Code as Policies` | [Code as Policies: Language Model Programs for Embodied Control](https://arxiv.org/abs/2209.07753) +\- + +- `2022/04` | `SayCan` | [Do As I Can, Not As I Say: Grounding Language in Robotic Affordances](https://arxiv.org/abs/2204.01691) +\- + +- `2012` | Introspective Learning and Reasoning +\- +[[Paper](https://link.springer.com/referenceworkentry/10.1007/978-1-4419-1428-6_1802)] + +#### 3.8.2 Extrospective Reasoning + +- `2023/06` | `Statler` | [Statler: State-Maintaining Language Models for Embodied Reasoning](https://arxiv.org/abs/2306.17840) +\- + +- `2023/02` | `Planner-Actor-Reporter` | [Collaborating with language models for embodied reasoning](https://arxiv.org/abs/2302.00763) +\- + +- `2023/02` | `Toolformer` | [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/abs/2302.04761) +\- + +- `2022/12` | `LLM-Planner` | [LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models](https://arxiv.org/abs/2212.04088) +\- + +- `2022/10` | `ReAct` | [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629) +\- + +- `2022/10` | `Self-Ask` | [Measuring and Narrowing the Compositionality Gap in Language Models](https://arxiv.org/abs/2210.03350) +\- + +- `2022/07` | `Inner Monologue` | [Inner Monologue: Embodied Reasoning through Planning with Language Models](https://arxiv.org/abs/2207.05608) +\- + +#### 3.8.3 Multi-agent Reasoning + +- `2023/07` | `Federated LLM` | [Federated Large Language Model: A Position Paper](https://arxiv.org/abs/2307.08925) +\- + +- `2023/07` | [Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems](https://arxiv.org/abs/2307.06187) +\- + +- `2023/07` | `Co-LLM-Agents` | [Building Cooperative Embodied Agents Modularly with Large Language Models](https://arxiv.org/abs/2307.02485) +\- + +- `2023/05` | [Improving Factuality and Reasoning in Language Models through Multiagent Debate](https://arxiv.org/abs/2305.14325) +\- + +- `2017/02` | `FIoT` | FIoT: An agent-based framework for self-adaptive and self-organizing applications based on the Internet of Things +\- +[[Paper](https://www.sciencedirect.com/science/article/pii/S0020025516313664)] + +- `2004` | A Practical Guide to the IBM Autonomic Computing Toolkit +\- +[[Book](https://books.google.com.hk/books/about/A_Practical_Guide_to_the_IBM_Autonomic_C.html?id=XHeoSgAACAAJ&redir_esc=y)] + +#### 3.8.4 Driving Reasoning + +- `2023/10` | [Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving](https://arxiv.org/abs/2310.16639) +\- + +- `2023/10` | [Vision Language Models in Autonomous Driving and Intelligent Transportation Systems](https://arxiv.org/abs/2310.14414) +\- + +- `2023/10` | `DriveGPT4` | [DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model](https://arxiv.org/abs/2310.01412) +\- + +- `2023/09` | `MotionLM` | [MotionLM: Multi-Agent Motion Forecasting as Language Modeling](https://arxiv.org/abs/2309.16534) +\- + +- `2023/06` | [End-to-end Autonomous Driving: Challenges and Frontiers](https://arxiv.org/abs/2306.16927) +\- + +- `2023/04` | [Graph-based Topology Reasoning for Driving Scenes](https://arxiv.org/abs/2304.05277) +\- + +- `2022/09` | [Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe](https://arxiv.org/abs/2209.05324) +\- + +- `2021/11` | Artificial intelligence: A powerful paradigm for scientific research +\- +[[Paper](https://www.sciencedirect.com/science/article/pii/S2666675821001041)] + +#### Benchmarks, Datasets, and Metrics + +- `2023/09` | `NuPrompt` / `PromptTrack` | [Language Prompt for Autonomous Driving](https://arxiv.org/abs/2309.04379) +\- + +- `2023/08` | `DriveLM` | Drive on Language: Unlocking the future where autonomous driving meets the unlimited potential of language +\- +[[Code](https://github.com/OpenDriveLab/DriveLM)] + +- `2023/07` | `LCTGen` | [Language Conditioned Traffic Generation](https://arxiv.org/abs/2307.07947) + +- `2023/05` | `NuScenes-QA` | [NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario](https://arxiv.org/abs/2305.14836) +\- + +- `2022/06` | `BEHAVIOR-1K` | [BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation](https://proceedings.mlr.press/v205/li23a.html) +\- + +- `2021/08` | `iGibson` | [iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks](https://arxiv.org/abs/2108.03272) +\- + +- `2021/06` | `Habitat 2.0` | [Habitat 2.0: Training Home Assistants to Rearrange their Habitat](https://arxiv.org/abs/2106.14405)z +\- + +- `2020/04` | `RoboTHOR` | [RoboTHOR: An Open Simulation-to-Real Embodied AI Platform](https://arxiv.org/abs/2004.06799) +\- + +- `2019/11` | `HAD` | [Grounding Human-to-Vehicle Advice for Self-driving Vehicles](https://arxiv.org/abs/1911.06978) +\- + +- `2019/04` | `Habitat` | [Habitat: A Platform for Embodied AI Research](https://arxiv.org/abs/1904.01201) +\- + +- `2018/08` | `Gibson` | [Gibson Env: Real-World Perception for Embodied Agents](https://arxiv.org/abs/1808.10654) +\- + +- `2018/06` | `VirtualHome` | [VirtualHome: Simulating Household Activities via Programs](https://arxiv.org/abs/1806.07011) +\- + + +### 3.9 Other Tasks and Applications + +#### 3.9.1 Theory of Mind (ToM) + +- `2023/02` | `ToM` | [Theory of Mind Might Have Spontaneously Emerged in Large Language Models](https://arxiv.org/abs/2302.02083) +\- + +#### 3.9.2 LLMs for Weather Prediction + +- `2022/09` | `MetNet-2` | Deep learning for twelve hour precipitation forecasts +\- +[[Paper](https://www.nature.com/articles/s41467-022-32483-x)] + +- `2023/07` | `Pangu-Weather` | Accurate medium-range global weather forecasting with 3D neural networks +\- +[[Paper](https://www.nature.com/articles/s41586-023-06185-3)] + +#### 3.9.3 Abstract Reasoning + +- `2023/05` | [Large Language Models Are Not Strong Abstract Reasoners](https://arxiv.org/abs/2305.19555) +\- + +#### 3.9.4 Defeasible Reasoning + +- `2023/06` | `BoardgameQA` | [BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information](https://arxiv.org/abs/2306.07934) +\- + +- `2021/10` | `CURIOUS` | [Think about it! Improving defeasible reasoning by first modeling the question scenario](https://arxiv.org/abs/2110.12349) +\- + +- `2020/11` | `Defeasible NLI` / `δ-NLI` | [Thinking Like a Skeptic: Defeasible Inference in Natural Language](https://aclanthology.org/2020.findings-emnlp.418/) +\- +[[Paper](https://aclanthology.org/2020.findings-emnlp.418.pdf)] + +- `2020/04` | `KACC` | [KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization and Completion](https://arxiv.org/abs/2004.13631) +\- + +- `2009/01` | A Recursive Semantics for Defeasible Reasoning +\- +[[Paper](https://link.springer.com/chapter/10.1007/978-0-387-98197-0_9)] + +#### 3.9.5 Medical Reasoning + +- `2023/10` | `GPT4V-Medical-Report` | [Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V](https://arxiv.org/abs/2310.19061) +\- + +- `2023/10` | `VisionFM` | [VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence](https://arxiv.org/abs/2310.04992) +\- + +- `2023/09` | [The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)](https://arxiv.org/abs/2309.17421) +\- + +- `2023/09` | `RETFound` | A foundation model for generalizable disease detection from retinal images +\- +[[Paper](https://www.nature.com/articles/s41586-023-06555-x)] + +- `2023/08` | `ELIXR` | [ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders](https://arxiv.org/abs/2308.01317) +\- + +- `2023/07` | `Med-PaLM M`| [Towards Generalist Biomedical AI](https://arxiv.org/abs/2307.14334) +\- + +- `2023/06` | `LLaVA-Med` | [LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day](https://arxiv.org/abs/2306.00890) +\- + +- `2023/05` | `Med-PaLM 2` | [Towards Expert-Level Medical Question Answering with Large Language Models](https://arxiv.org/abs/2305.09617) +\- + +#### 3.9.6 Bioinformatics Reasoning + +- `2023/07` | `Prot2Text` | [Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers](https://arxiv.org/abs/2307.14367) +\- + +- `2023/07` | `Uni-RNA` | [Uni-RNA: Universal Pre-Trained Models Revolutionize RNA Research](https://www.biorxiv.org/content/10.1101/2023.07.11.548588v1) +\- + +- `2023/07` | `RFdiffusion` | De novo design of protein structure and function with RFdiffusion +\- +[[Paper](https://www.nature.com/articles/s41586-023-06415-8)] + +- `2023/06` | `HyenaDNA` | [HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution](https://arxiv.org/abs/2306.15794) +\- + +- `2023/06` | `DrugGPT` | [DrugGPT: A GPT-based Strategy for Designing Potential Ligands Targeting Specific Proteins](https://www.biorxiv.org/content/10.1101/2023.06.29.543848v1) +\- + +- `2023/04` | `GeneGPT` | [GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information](https://arxiv.org/abs/2304.09667) +\- + +- `2023/04` | Drug discovery companies are customizing ChatGPT: here’s how +\- +[[News](https://www.nature.com/articles/s41587-023-01788-7)] + +- `2023/01` | `ProGen` | Large language models generate functional protein sequences across diverse families +\- +[[Paper](https://www.nature.com/articles/s41587-022-01618-2)] + +- `2022/06` | `ProGen2` | [ProGen2: Exploring the Boundaries of Protein Language Models](https://arxiv.org/abs/2206.13517) +\- + +- `2021/07` | `AlphaFold` | Highly accurate protein structure prediction with AlphaFold +\- +[[Paper](https://www.nature.com/articles/s41586-021-03819-2)] + +#### 3.9.7 Long-Chain Reasoning + +- `2022/12` | `Fine-tune-CoT` | [Large Language Models Are Reasoning Teachers](https://arxiv.org/abs/2212.10071) +\- + +- `2021/09` | `PlaTe` | [PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks](https://arxiv.org/abs/2109.04869) +\- + +## 4 Reasoning Techniques + + +### 4.1 Pre-Training + +#### 4.1.1 Data + +##### a. Data - Text + +- `2023/07` | `peS2o` | peS2o (Pretraining Efficiently on S2ORC) Dataset +\- +[[Code](https://github.com/allenai/pes2o)] + +- `2023/05` | `ROOTS` / `BLOOM` | [The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset](https://arxiv.org/abs/2303.03915) +\- + +- `2023/04` | `RedPajama` | RedPajama: an Open Dataset for Training Large Language Models +\- +[[Code](https://github.com/togethercomputer/RedPajama-Data)] + +- `2020/12` | `The Pile` | [The Pile: An 800GB Dataset of Diverse Text for Language Modeling](https://arxiv.org/abs/2101.00027) +\- + +- `2020/04` | `Reddit` | [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) +\- + +- `2020/04` | `CLUE` | [CLUE: A Chinese Language Understanding Evaluation Benchmark](https://arxiv.org/abs/2004.05986) +\- + +- `2019/10` | `C4` | [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) +\- + +- `2013/10` | `Gutenberg` | [Complexity of Word Collocation Networks: A Preliminary Structural Analysis](https://arxiv.org/abs/1310.5111) + +##### b. Data - Image + +- `2023/06` | `I2E` / `MOFI` | [MOFI: Learning Image Representations from Noisy Entity Annotated Images](https://arxiv.org/abs/2306.07952) +\- + +- `2022/01` | `SWAG` [Revisiting Weakly Supervised Pre-Training of Visual Perception Models](https://arxiv.org/abs/2201.08371) +\- + +- `2021/04` | `ImageNet-21K` | [ImageNet-21K Pretraining for the Masses](https://arxiv.org/abs/2104.10972) +\- + +- `2017/07` | `JFT` | [Revisiting Unreasonable Effectiveness of Data in Deep Learning Era](https://arxiv.org/abs/1707.02968) +\- + +- `2014/09` | `ImageNet` | [ImageNet Large Scale Visual Recognition Challenge](https://arxiv.org/abs/1409.0575) +\- + +##### c. Data - Multimodality + +- `2023/09` | `Point-Bind` | [Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following](https://arxiv.org/abs/2309.00615) +\- + +- `2023/05` | `ImageBind` | [ImageBind: One Embedding Space To Bind Them All](https://arxiv.org/abs/2305.05665) +\- + +- `2023/04` | `DataComp` | [DataComp: In search of the next generation of multimodal datasets](https://arxiv.org/abs/2304.14108) +\- + +- `2022/10` | `LAION-5B` | [LAION-5B: An open large-scale dataset for training next generation image-text models](https://arxiv.org/abs/2210.08402) +\- + +- `2022/08` | `Shutterstock` | [Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP](https://arxiv.org/abs/2208.05516) +\- + +- `2022/08` | `COYO-700M` | COYO-700M: Image-Text Pair Dataset +\- +[[Code](https://github.com/kakaobrain/coyo-dataset)] + +- `2022/04` | `M3W` | [Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198) +\- + +- `2021/11` | `RedCaps` | [RedCaps: web-curated image-text data created by the people, for the people](https://arxiv.org/abs/2111.11431) +\- + +- `2021/11` | `LAION-400M` | [LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs](https://arxiv.org/abs/2111.02114) +\- + +- `2021/03` | `WIT` | [WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning](https://arxiv.org/abs/2103.01913) +\- + +- `2011/12` | `Im2Text` / `SBU` | [Im2Text: Describing Images Using 1 Million Captioned Photographs](https://papers.nips.cc/paper_files/paper/2011/hash/5dd9db5e033da9c6fb5ba83c7a7ebea9-Abstract.html) +\- + +#### 4.1.2 Network Architecture + +- `2023/04` | [Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder](https://arxiv.org/abs/2304.04052) +\- + +##### a. Encoder-Decoder + +- `2019/10` | `BART` | [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) +\- + +- `2019/10` | `T5` | [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) +\- + +- `2018/10` | `BERT` | [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) +\- +[[Paper](https://aclanthology.org/N19-1423.pdf)] +[[Code](https://github.com/google-research/bert)] +[[Blog](https://blog.research.google/2018/11/open-sourcing-bert-state-of-art-pre.html)] + +- `2017/06` | `Transformer` | [Attention Is All You Need](https://arxiv.org/abs/1706.03762) +\- + +##### b. Decoder-Only + +- `2023/07` | `Llama 2` | [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288) +\- +[[Paper](https://arxiv.org/pdf/2307.09288.pdf)] +[[Code](https://github.com/facebookresearch/llama)] +[[Blog](https://ai.meta.com/llama/)] + +- `2023/02` | `LLaMA` | [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) +\- +[[Paper](https://arxiv.org/pdf/2302.13971.pdf)] +[[Code](https://github.com/facebookresearch/llama)] +[[Blog](https://ai.meta.com/blog/large-language-model-llama-meta-ai/)] + +- `2022/11` | `BLOOM` | [BLOOM: A 176B-Parameter Open-Access Multilingual Language Model](https://arxiv.org/abs/2211.05100) +\- + +- `2022/10` | `GLM` | [GLM-130B: An Open Bilingual Pre-trained Model](https://arxiv.org/abs/2210.02414) +\- + +- `2022/05` | `OPT` | [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) +\- + +- `2021/12` | `Gopher` | [Scaling Language Models: Methods, Analysis & Insights from Training Gopher](https://arxiv.org/abs/2112.11446) +\- + +- `2021/05` | `GPT-3` | [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) +\- +[[Paper](https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf)] +[[Code](https://github.com/openai/gpt-3)] + +- `2019/02` | `GPT-2` | Language Models are Unsupervised Multitask Learners +\- +[[Paper](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)] + +- `2018/06` | `GPT-1` | Improving Language Understanding by Generative Pre-Training +\- +[[Paper](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf)] + +##### c. CLIP Variants + +- `2023/05` | `LaCLIP` | [Improving CLIP Training with Language Rewrites](https://arxiv.org/abs/2305.20088) +\- + +- `2023/04` | `DetCLIPv2` | [DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment](https://arxiv.org/abs/2304.04514) +\- + +- `2022/12` | `FLIP` | [Scaling Language-Image Pre-training via Masking](https://arxiv.org/abs/2212.00794) +\- + +- `2022/09` | `DetCLIP` | [DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection](https://arxiv.org/abs/2209.09407) +\- + +- `2022/04` | `K-LITE` | [K-LITE: Learning Transferable Visual Models with External Knowledge](https://arxiv.org/abs/2204.09222) +\- + +- `2021/11` | `FILIP` | [FILIP: Fine-grained Interactive Language-Image Pre-Training](https://arxiv.org/abs/2111.07783) +\- + +- `2021/02` | `CLIP` | [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) +\- +[[Paper](https://proceedings.mlr.press/v139/radford21a/radford21a.pdf)] +[[Code](https://github.com/openai/CLIP)] +[[Blog](https://openai.com/research/clip)] + +##### d. Others + +- `2023/09` | `StreamingLLM` | [Efficient Streaming Language Models with Attention Sinks](https://arxiv.org/abs/2309.17453) +\- + +- `2023/07` | `RetNet` | [Retentive Network: A Successor to Transformer for Large Language Models](https://arxiv.org/abs/2307.08621) +\- + +- `2023/07` | | [LongNet: Scaling Transformers to 1,000,000,000 Tokens](https://arxiv.org/abs/2307.02486) +\- + +- `2023/05` | `RWKV` | [RWKV: Reinventing RNNs for the Transformer Era](https://arxiv.org/abs/2305.13048) +\- + +- `2023/02` | `Hyena` | [Hyena Hierarchy: Towards Larger Convolutional Language Models](https://arxiv.org/abs/2302.10866) +\- + +- `2022/12` | `H3` | [Hungry Hungry Hippos: Towards Language Modeling with State Space Models](https://arxiv.org/abs/2212.14052) +\- + +- `2022/06` | `GSS` | [Long Range Language Modeling via Gated State Spaces](https://arxiv.org/abs/2206.13947) +\- + +- `2022/03` | `DSS` | [Diagonal State Spaces are as Effective as Structured State Spaces](https://arxiv.org/abs/2203.14343) +\- + +- `2021/10` | `S4` | [Efficiently Modeling Long Sequences with Structured State Spaces](https://arxiv.org/abs/2111.00396) +\- + + +### 4.2 Fine-Tuning + +#### 4.2.1 Data + +- `2023/09` | `MetaMath` | [MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models](https://arxiv.org/abs/2309.12284) +\- + +- `2023/09` | `MAmmoTH` | [MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning](https://arxiv.org/abs/2309.05653) +\- + +- `2023/08` | `WizardMath` | [WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct](https://arxiv.org/abs/2308.09583) +\- + +- `2023/08` | `RFT` | [Scaling Relationship on Learning Mathematical Reasoning with Large Language Models](https://arxiv.org/abs/2308.01825) +\- + +- `2023/05` | `PRM800K` / `` | [Let's Verify Step by Step](https://arxiv.org/abs/2305.20050) +\- + +- `2023/05` | `Distilling Step-by-Step` | [Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes](https://arxiv.org/abs/2305.02301) +\- + +- `2023/01` | [Specializing Smaller Language Models towards Multi-Step Reasoning](https://arxiv.org/abs/2301.12726) +\- + +- `2022/12` | `Fine-tune-CoT` | [Large Language Models Are Reasoning Teachers](https://arxiv.org/abs/2212.10071) +\- + +- `2022/12` | [Teaching Small Language Models to Reason](https://arxiv.org/abs/2212.08410) +\- + +- `2022/10` | [Large Language Models Can Self-Improve](https://arxiv.org/abs/2210.11610) +\- + +- `2022/10` | [Explanations from Large Language Models Make Small Reasoners Better](https://arxiv.org/abs/2210.06726) +\- + +#### 4.2.2 Parameter-Efficient Fine-tuning + +##### a. Adapter Tuning + +- `2023/03` | `LLaMA-Adapter` | [LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention](https://arxiv.org/abs/2303.16199) +\- + +- `2022/05` | `AdaMix` | [AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning](https://arxiv.org/abs/2205.12410) +\- + +- `2021/10` | [Towards a Unified View of Parameter-Efficient Transfer Learning](https://arxiv.org/abs/2110.04366) +\- + +- `2021/06` | `Compacter` | [Compacter: Efficient Low-Rank Hypercomplex Adapter Layers](https://arxiv.org/abs/2106.04647) +\- + +- `2020/04` | `MAD-X` | [MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer](https://arxiv.org/abs/2005.00052) +\- + +- `2019/02` | `Adapter` | [Parameter-Efficient Transfer Learning for NLP](https://arxiv.org/abs/1902.00751) +\- + +##### b. Low-Rank Adaptation + +- `2023/05` | `QLoRA` | [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314) +\- + +- `2023/03` | `AdaLoRA` | [Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2303.10512) +\- + +- `2022/12` | `KronA` | [KronA: Parameter Efficient Tuning with Kronecker Adapter](https://arxiv.org/abs/2212.10650) +\- + +- `2022/10` | `DyLoRA` | [DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation](https://arxiv.org/abs/2210.07558) +\- + +- `2021/06` | `LoRA` | [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) +\- + +##### c. Prompt Tuning + +- `2021/10` | `P-Tuning v2` | [P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/abs/2110.07602) +\- + +- `2021/04` | `Prompt Tuning` | [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/abs/2104.08691) +\- + +- `2021/04` | `OptiPrompt` | [Factual Probing Is [MASK]: Learning vs. Learning to Recall](https://arxiv.org/abs/2104.05240) +\- + +- `2021/03` | `P-Tuning` | [GPT Understands, Too](https://arxiv.org/abs/2103.10385) +\- + +- `2021/01` | `Prefix-Tuning` | [Prefix-Tuning: Optimizing Continuous Prompts for Generation](https://arxiv.org/abs/2101.00190) +\- + +##### d. Partial Parameter Tuning + +- `2023/04` | `DiffFit` | [DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2304.06648) +\- + +- `2022/10` | `SSF` | [Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning](https://arxiv.org/abs/2210.08823) +\- + +- `2021/09` | `Child-Tuning` | [Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning](https://arxiv.org/abs/2109.05687) +\- + +- `2021/06` | `BitFit` | [BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models](https://arxiv.org/abs/2106.10199) +\- + +##### e. Mixture-of-Modality Adaption + +- `2023/10` | `LLaVA-1.5` | [Improved Baselines with Visual Instruction Tuning](https://arxiv.org/abs/2310.03744) +\- + +- `2023/05` | `MMA` / `LaVIN` | [Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models](https://arxiv.org/abs/2305.15023) +\- + +- `2023/04` | `LLaMA-Adapter V2` | [LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model](https://arxiv.org/abs/2304.15010) +\- + +- `2023/04` | `LLaVA` | [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) +\- + +- `2023/02` | `RepAdapter` | [Towards Efficient Visual Adaption via Structural Re-parameterization](https://arxiv.org/abs/2302.08106) +\- + + +### 4.3 Alignment Training + +#### 4.3.1 Data + +##### a. Data - Human + +- `2023/06` | `Dolly` | Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM +\- +[[Code](https://github.com/databrickslabs/dolly)] + +- `2023/04` | `LongForm` | [LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction](https://arxiv.org/abs/2304.08460) +\- + +- `2023/04` | `COIG` | [Chinese Open Instruction Generalist: A Preliminary Release](https://arxiv.org/abs/2304.07987) +\- + +- `2023/04` | `OpenAssistant Conversations` | [OpenAssistant Conversations -- Democratizing Large Language Model Alignment](https://arxiv.org/abs/2304.07327) +\- + +- `2023/01` | `Flan 2022` | [The Flan Collection: Designing Data and Methods for Effective Instruction Tuning](https://arxiv.org/abs/2301.13688) +\- + +- `2022/11` | `xP3` | [Crosslingual Generalization through Multitask Finetuning](https://arxiv.org/abs/2211.01786) +\- + +- `2022/04` | `Super-NaturalInstructions` | [Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks](https://arxiv.org/abs/2204.07705) +\- + +- `2021/11` | `ExT5` | [ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning](https://arxiv.org/abs/2111.10952) +\- + +- `2021/10` | `MetaICL` | [MetaICL: Learning to Learn In Context](https://arxiv.org/abs/2110.15943) +\- + +- `2021/10` | `P3` | [Multitask Prompted Training Enables Zero-Shot Task Generalization](https://arxiv.org/abs/2110.08207) +\- + +- `2021/04` | `CrossFit` | [CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP](https://arxiv.org/abs/2104.08835) +\- + +- `2021/04` | `NATURAL INSTRUCTIONS` | [Cross-Task Generalization via Natural Language Crowdsourcing Instructions](https://arxiv.org/abs/2104.08773) +\- + +- `2020/05` | `UnifiedQA` | [UnifiedQA: Crossing Format Boundaries With a Single QA System](https://arxiv.org/abs/2005.00700) +\- + +##### b. Data - Synthesis + +- `2023/08` | `Instruction Backtranslation` | [Self-Alignment with Instruction Backtranslation](https://arxiv.org/abs/2308.06259) +\- + +- `2023/05` | `Dynosaur` | [Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation](https://arxiv.org/abs/2305.14327) +\- + +- `2023/05` | `UltraChat` | [Enhancing Chat Language Models by Scaling High-quality Instructional Conversations](https://arxiv.org/abs/2305.14233) +\- + +- `2023/05` | `CoT Collection` | [The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning](https://arxiv.org/abs/2305.14045) +\- + +- `2023/05` | `CoEdIT` | [CoEdIT: Text Editing by Task-Specific Instruction Tuning](https://arxiv.org/abs/2305.09857) +\- + +- `2023/04` | `LaMini-LM` | [LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions](https://arxiv.org/abs/2304.14402) +\- + +- `2023/04` | `GPT-4-LLM` | [Instruction Tuning with GPT-4](https://arxiv.org/abs/2304.03277) +\- + +- `2023/04` | `Koala` | Koala: A Dialogue Model for Academic Research +\- +[[Blog](https://bair.berkeley.edu/blog/2023/04/03/koala/)] + +- `2023/03` | `Alpaca` | Alpaca: A Strong, Replicable Instruction-Following Model +\- +[[Blog](https://crfm.stanford.edu/2023/03/13/alpaca.html)] + +- `2023/03` | `GPT4All` | GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo +\- +[[Code](https://github.com/nomic-ai/gpt4all)] + +- `2022/12` | `OPT-IML` / `OPT-IML Bench` | [OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization](https://arxiv.org/abs/2212.12017) +\- + +- `2022/12` | `Self-Instruc` | [Self-Instruct: Aligning Language Models with Self-Generated Instructions](https://arxiv.org/abs/2212.10560) +\- + +- `2022/12` | `Unnatural Instructions` | [Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor](https://arxiv.org/abs/2212.09689) +\- + +#### 4.3.2 Training Pipeline + +##### a. Online Human Preference Training + +- `2023/06` | `APA` | [Fine-Tuning Language Models with Advantage-Induced Policy Alignment](https://arxiv.org/abs/2306.02231) +\- + +- `2023/04` | `RAFT` | [RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment](https://arxiv.org/abs/2304.06767) +\- + +- `2022/03` | `InstructGPT` / `RLHF` | [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155) +\- + +##### b. Offline Human Preference Training + +- `2023/06` | `PRO` | [Preference Ranking Optimization for Human Alignment](https://arxiv.org/abs/2306.17492) +\- + +- `2023/05` | `DPO` | [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290) +\- + +- `2023/04` | `RRHF` | [RRHF: Rank Responses to Align Language Models with Human Feedback without tears](https://arxiv.org/abs/2304.05302) +\- + +- `2022/09` | `SLiC` | [Calibrating Sequence likelihood Improves Conditional Language Generation](https://arxiv.org/abs/2210.00045) +\- + + +### 4.4 Mixture of Experts (MoE) + +- `2023/06` | [An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training](https://arxiv.org/abs/2306.17165) +\- + +- `2023/03` | `MixedAE` | [Mixed Autoencoder for Self-supervised Visual Representation Learning](https://arxiv.org/abs/2303.17152) +\- + +- `2022/12` | `Mod-Squad` | [Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners](https://arxiv.org/abs/2212.08066) +\- + +- `2022/04` | `MoEBERT` | [MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation](https://arxiv.org/abs/2204.07675) +\- + +- `2021/12` | `GLaM` | [GLaM: Efficient Scaling of Language Models with Mixture-of-Experts](https://arxiv.org/abs/2112.06905) +\- + +- `2021/07` | `WideNet` | [Go Wider Instead of Deeper](https://arxiv.org/abs/2107.11817) +\- + +- `2021/01` | `Switch Transformers` | [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) +\- + +- `2020/06` | `GShard` | [GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding](https://arxiv.org/abs/2006.16668) +\- + +- `2017/01` | `Sparsely-Gated Mixture-of-Experts` | [Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer](https://arxiv.org/abs/1701.06538) +\- + +- `1991/03` | Adaptive Mixtures of Local Experts +\- +[[Paper](https://ieeexplore.ieee.org/document/6797059)] + + +### 4.5 In-Context Learning + +- `2022/10` | `FLAN-T5` | [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) +\- + +- `2021/05` | `GPT-3` | [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) +\- +[[Paper](https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf)] +[[Code](https://github.com/openai/gpt-3)] + +#### 4.5.1 Demonstration Example Selection + +##### a. Prior-Knowledge Approach + +- `2022/12` | [Diverse Demonstrations Improve In-context Compositional Generalization](https://arxiv.org/abs/2212.06800) +\- + +- `2022/11` | [Complementary Explanations for Effective In-Context Learning](https://arxiv.org/abs/2211.13892) +\- + +- `2022/10` | `Auto-CoT` | [Automatic Chain of Thought Prompting in Large Language Models](https://arxiv.org/abs/2210.03493) +\- + +- `2022/10` | `Complex CoT` | [Complexity-Based Prompting for Multi-Step Reasoning](https://arxiv.org/abs/2210.00720) +\- + +- `2022/10` | `EmpGPT-3` | [Does GPT-3 Generate Empathetic Dialogues? A Novel In-Context Example Selection Method and Automatic Evaluation Metric for Empathetic Dialogue Generation](https://aclanthology.org/2022.coling-1.56/) +\- +[[Paper](https://github.com/passing2961/EmpGPT-3)] + +- `2022/09` | [Selective Annotation Makes Language Models Better Few-Shot Learners](https://arxiv.org/abs/2209.01975) +\- + +- `2021/01` | [What Makes Good In-Context Examples for GPT-3?](https://arxiv.org/abs/2101.06804) +\- + +##### b. Retrieval Approach + +- `2023/10` | `DQ-LoRe` | [DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning](https://arxiv.org/abs/2310.02954) +\- + +- `2023/07` | `LLM-R` | [Learning to Retrieve In-Context Examples for Large Language Models](https://arxiv.org/abs/2307.07164) +\- + +- `2023/05` | `Dr.ICL` | [Dr.ICL: Demonstration-Retrieved In-context Learning](https://arxiv.org/abs/2305.14128) +\- + +- `2023/02` | `LENS` | [Finding Support Examples for In-Context Learning](https://arxiv.org/abs/2302.13539) +\- + +- `2023/02` | `CEIL` | [Compositional Exemplars for In-context Learning](https://arxiv.org/abs/2302.05698) +\- + +- `2021/12` | [Learning To Retrieve Prompts for In-Context Learning](https://arxiv.org/abs/2112.08633) +\- + +#### 4.5.2 Chain-of-Thought + +##### a. Zero-Shot CoT + +- `2023/05` | `Plan-and-Solve` | [Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models](https://arxiv.org/abs/2305.04091) +\- + +- `2022/05` | `Zero-shot-CoT` | [Large Language Models are Zero-Shot Reasoners](https://arxiv.org/abs/2205.11916) +\- +[[Paper](https://openreview.net/pdf?id=e2TBb5y0yFf)] +[[Code](https://github.com/kojima-takeshi188/zero_shot_cot)] + +##### b. Few-Shot CoT + +- `2023/07` | `SoT` | [Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding](https://arxiv.org/abs/2307.15337) +\- + +- `2023/05` | `Code Prompting` | [Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models](https://arxiv.org/abs/2305.18507) +\- + +- `2023/05` | `GoT` | [Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models](https://arxiv.org/abs/2305.16582) +\- + +- `2023/05` | `ToT` | [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601) +\- + +- `2023/03` | `MathPrompter` | [MathPrompter: Mathematical Reasoning using Large Language Models](https://arxiv.org/abs/2303.05398) +\- + +- `2022/11` | `PoT` | [Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks](https://arxiv.org/abs/2211.12588) +\- + +- `2022/11` | `PAL` | [PAL: Program-aided Language Models](https://arxiv.org/abs/2211.10435) +\- + +- `2022/10` | `Auto-CoT` | [Automatic Chain of Thought Prompting in Large Language Models](https://arxiv.org/abs/2210.03493) +\- + +- `2022/10` | `Complex CoT` | [Complexity-Based Prompting for Multi-Step Reasoning](https://arxiv.org/abs/2210.00720) +\- + +- `2022/05` | `Least-to-Most Prompting` | [Least-to-Most Prompting Enables Complex Reasoning in Large Language Models](https://arxiv.org/abs/2205.10625) +\- + +- `2022/01` | [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) +\- + +##### c. Multiple Paths Aggregation + +- `2023/05` | `RAP` | [Reasoning with Language Model is Planning with World Model](https://arxiv.org/abs/2305.14992) +\- + +- `2023/05` | [Automatic Model Selection with Large Language Models for Reasoning](https://arxiv.org/abs/2305.14333) +\- + +- `2023/05` | `AdaptiveConsistency` | [Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs](https://arxiv.org/abs/2305.11860) +\- + +- `2023/05` | `ToT` | [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601) +\- + +- `2023/05` | `ToT` | [Large Language Model Guided Tree-of-Thought](https://arxiv.org/abs/2305.08291) +\- + +- `2023/05` | [Self-Evaluation Guided Beam Search for Reasoning](https://arxiv.org/abs/2305.00633) +\- + +- `2022/10` | `Complex CoT` | [Complexity-Based Prompting for Multi-Step Reasoning](https://arxiv.org/abs/2210.00720) +\- + +- `2022/06` | `DIVERSE` | [Making Large Language Models Better Reasoners with Step-Aware Verifier](https://arxiv.org/abs/2206.02336) +\- + +- `2022/03` | [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) +\- + +#### 4.5.3 Multi-Round Prompting + +##### a. Learned Refiners + +- `2023/02` | `LLM-Augmenter` | [Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback](https://arxiv.org/abs/2302.12813) +\- + +- `2022/10` | `Self-Correction` | [Generating Sequences by Learning to Self-Correct](https://arxiv.org/abs/2211.00053) +\- + +- `2022/08` | `PEER` | [PEER: A Collaborative Language Model](https://arxiv.org/abs/2208.11663) +\- + +- `2022/04` | `R3` | [Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision](https://arxiv.org/abs/2204.03685) +\- + +- `2021/10` | `CURIOUS` | [Think about it! Improving defeasible reasoning by first modeling the question scenario](https://arxiv.org/abs/2110.12349) +\- + +- `2020/05` | `DrRepair` | [Graph-based, Self-Supervised Program Repair from Diagnostic Feedback](https://arxiv.org/abs/2005.10636) +\- + +##### b. Prompted Refiners + +- `2023/06` | `InterCode` | [InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback](https://arxiv.org/abs/2306.14898) +\- + +- `2023/06` | [Is Self-Repair a Silver Bullet for Code Generation?](https://arxiv.org/abs/2306.09896) +\- + +- `2023/05` | [Improving Factuality and Reasoning in Language Models through Multiagent Debate](https://arxiv.org/abs/2305.14325) +\- + +- `2023/05` | `CRITIC` | [CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing](https://arxiv.org/abs/2305.11738) +\- + +- `2023/05` | `GPT-Bargaining` | [Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback](https://arxiv.org/abs/2305.10142) +\- + +- `2023/05` | `Self-Edit` | [Self-Edit: Fault-Aware Code Editor for Code Generation](https://arxiv.org/abs/2305.04087) +\- + +- `2023/04` | `PHP` | [Progressive-Hint Prompting Improves Reasoning in Large Language Models](https://arxiv.org/abs/2304.09797) +\- + +- `2023/04` | ` Self-collaboration` | [Self-collaboration Code Generation via ChatGPT](https://arxiv.org/abs/2304.07590) +\- + +- `2023/04` | `Self-Debugging` | [Teaching Large Language Models to Self-Debug](https://arxiv.org/abs/2304.05128) +\- + +- `2023/04` | `REFINER` | [REFINER: Reasoning Feedback on Intermediate Representation](https://arxiv.org/abs/2304.01904) +\- + +- `2023/03` | `Self-Refine` | [Self-Refine: Iterative Refinement with Self-Feedback](https://arxiv.org/abs/2303.17651) +\- + + +### 4.6 Autonomous Agent + +- `2023/10` | `planning tokens` | [Guiding Language Model Reasoning with Planning Tokens](https://aps.arxiv.org/abs/2310.05707) +\- + +- `2023/09` | `AutoAgents` | [AutoAgents: A Framework for Automatic Agent Generation](https://arxiv.org/abs/2309.17288) +\- + +- `2023/06` | `AssistGPT` | [AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn](https://arxiv.org/abs/2306.08640) +\- + +- `2023/05` | `SwiftSage` | [SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks](https://arxiv.org/abs/2305.17390) +\- + +- `2023/05` | `MultiTool-CoT` | [MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting](https://arxiv.org/abs/2305.16896) +\- + +- `2023/05` | `Voyager` | [Voyager: An Open-Ended Embodied Agent with Large Language Models](https://arxiv.org/abs/2305.16291) +\- + +- `2023/05` | `ChatCoT` | [ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models](https://arxiv.org/abs/2305.14323) +\- + +- `2023/05` | `CREATOR` | [CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models](https://arxiv.org/abs/2305.14318) +\- + +- `2023/05` | `TRICE` | [Making Language Models Better Tool Learners with Execution Feedback](https://arxiv.org/abs/2305.13068) +\- + +- `2023/05` | `ToolkenGPT` | [ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings](https://arxiv.org/abs/2305.11554) +\- + +- `2023/04` | `Chameleon` | [Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models](https://arxiv.org/abs/2304.09842) +\- + +- `2023/04` | `OpenAGI` | [OpenAGI: When LLM Meets Domain Experts](https://arxiv.org/abs/2304.04370) +\- + +- `2023/03` | `CAMEL` | [CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society](https://arxiv.org/abs/2303.17760) +\- + +- `2023/03` | `HuggingGPT` | [HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face](https://arxiv.org/abs/2303.17580) +\- + +- `2023/03` | `Reflexion` | [Reflexion: Language Agents with Verbal Reinforcement Learning](https://arxiv.org/abs/2303.11366) +\- + +- `2023/03` | `ART` | [ART: Automatic multi-step reasoning and tool-use for large language models](https://arxiv.org/abs/2303.09014) +\- + +- `2023/03` | `Auto-GPT` | Auto-GPT: An Autonomous GPT-4 Experiment +\- +[[Code](https://github.com/antony0596/auto-gpt)] + +- `2023/02` | `Toolformer` | [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/abs/2302.04761) +\- + +- `2022/11` | `VISPROG` | [Visual Programming: Compositional visual reasoning without training](https://arxiv.org/abs/2211.11559) +\- + +- `2022/10` | `ReAct` | [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629) +\- diff --git a/assets/0_reasoning.jpg b/assets/0_reasoning.jpg new file mode 100644 index 0000000..b3afa86 Binary files /dev/null and b/assets/0_reasoning.jpg differ diff --git a/assets/1_overview.jpg b/assets/1_overview.jpg new file mode 100644 index 0000000..94a327e Binary files /dev/null and b/assets/1_overview.jpg differ diff --git a/assets/22_foundation_models.jpg b/assets/22_foundation_models.jpg new file mode 100644 index 0000000..bb4cded Binary files /dev/null and b/assets/22_foundation_models.jpg differ