Skip to content

A paper list about large language models and multimodal models (Diffusion, VLM). From foundations to applications. It is only used to record papers for my personal needs.

Notifications You must be signed in to change notification settings

Lukcy-ML/LLM-and-Multimodal-Paper-List

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 

Repository files navigation

LLM-and-VLM-Paper-List

A paper list about large language models and multi-modal models.
Note: It only records papers for my personal needs. It is welcome to open an issue if you think I missed some important or exciting work!

Table of Contents

Survey

  • HELM: Holistic evaluation of language models. TMLR. paper
  • HEIM: Holistic Evaluation of Text-to-Image Models. NeurIPS'2023. paper
  • Eval Survey: A Survey on Evaluation of Large Language Models. Arxiv'2023. paper
  • Healthcare LM Survey: A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics. Arxiv'2023. paper, github
  • Multimodal LLM Survey: A Survey on Multimodal Large Language Model. Arxiv'2023. paper, github
  • VLM for vision Task Survey: Vision Language Models for Vision Tasks: A Survey. Arxiv'2023. paper, github
  • Efficient LLM Survey: Efficient Large Language Models: A Survey. Arxiv'2023. paper, github
  • Prompt Engineering Survey: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. Arxiv'2021. paper
  • Multimodal Safety Survey: Safety of Multimodal Large Language Models on Images and Text. Arxiv'2024. paper
  • Multimodal LLM Recent Survey: MM-LLMs: Recent Advances in MultiModal Large Language Models. Arxiv'2024. paper
  • Prompt Engineering in LLM Survey: A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. Arxiv'2024. paper
  • LLM Security and Privacy Survey: A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly. Arxiv'2024. paper
  • LLM Privacy Survey: Privacy in Large Language Models: Attacks, Defenses and Future Directions. Arxiv'2023. paper

Language Model

Foundation LM Models

  • Transformer: Attention Is All You Need. NIPS'2017. paper
  • GPT-1: Improving Language Understanding by Generative Pre-Training. 2018. paper
  • BERT: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL'2019. paper
  • GPT-2: Language Models are Unsupervised Multitask Learners. 2018. paper
  • RoBERTa: RoBERTa: A Robustly Optimized BERT Pretraining Approach. Arxiv'2019, paper
  • DistilBERT: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Arxiv'2019. paper
  • T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR'2020. paper
  • GPT-3: Language Models are Few-Shot Learners. NeurIPS'2020. paper
  • GLaM: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. ICML'2022. paper
  • PaLM: PaLM: Scaling Language Modeling with Pathways. ArXiv'2022. paper
  • BLOOM: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. Arxiv'2022. paper
  • BLOOMZ: Crosslingual Generalization through Multitask Finetuning. Arxiv'2023. paper
  • LLaMA: LLaMA: Open and Efficient Foundation Language Models. Arxiv'2023. paper
  • GPT-4: GPT-4 Technical Report. Arxiv'2023. paper
  • PaLM 2: PaLM 2 Technical Report. 2023. paper
  • LLaMA 2: Llama 2: Open foundation and fine-tuned chat models. Arxiv'2023. paper
  • Mistral: Mistral 7B. Arxiv'2023. paper
  • Phi1: Project Link
  • Phi1.5: Project Link
  • Phi2: Project Link
  • Falcon: Project Link

RLHF

  • PPO: Proximal Policy Optimization Algorithms. Arxiv'2017. paper
  • DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS'2023. paper

Parameter Efficient Fine-tuning

  • LoRA: LoRA: Low-Rank Adaptation of Large Language Models. Arxiv'2021. paper
  • Q-LoRA: QLoRA: Efficient Finetuning of Quantized LLMs. NeurIPS'2023. paper

Healthcare LM

  • Med-PaLM: Large Language Models Encode Clinical Knowledge. Arxiv'2022. paper
  • MedAlpaca: MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data. Arxiv'2023. paper
  • Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models. Arxiv'2023. paper
  • HuatuoGPT: HuatuoGPT, towards Taming Language Model to Be a Doctor. EMNLP'2023(findings). paper
  • GPT-4-Med: Capabilities of GPT-4 on Medical Challenge Problems. Arxiv'2023. paper

Watermarking LLM

Prompt Engineering in LLM

Hard Prompt

  • PET: Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. EACL'2021. paper
  • Making Pre-trained Language Models Better Few-shot Learners. ACL'2021. paper

Soft Prompt

  • Prompt-Tuning:The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP'2021 [paper]
  • Prefix-Tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation. ACL'2021. paper
  • P-tuning: P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. ACL'2022. paper
  • P-tuning v2: P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. Arxiv'2022. Paper

Between Soft and Hard

  • Auto-Prompt: AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. EMNLP'2020. paper
  • FluentPrompt: Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?. EMNLP'2023 (findings). paper
  • PEZ: Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. Arxiv'2023. paper

Multi-modal Models

Foundation Multi-Modal Models

  • CLIP: Learning Transferable Visual Models From Natural Language Supervision. ICML'2021. paper
  • DeCLIP: Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. ICLR'2022. paper
  • FILIP: FILIP: Fine-grained Interactive Language-Image Pre-Training. ICLR'2022. paper
  • Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models. CVPR'2022. paper
  • BLIP: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. ICML'2022. paper
  • BLIP2: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ICML'2023. paper
  • LLaMA-Adapter: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. Arxiv'2023. paper
  • LLaVA: Visual Instruction Tuning. NeurIPS'2023. paper
  • Instruct BLIP: InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. NeurIPS'2023. paper

Multi-modal Safety

  • SLD: Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. CVPR'2023. paper
  • ESD: Erasing Concepts from Diffusion Models. ICCV'2023. paper

VLM Hullucinatins

  • POPE: Evaluating Object Hallucination in Large Vision-Language Models. EMNLP'2023. paper
  • HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models. CVPR'2024. paper

VLM Privacy

Prompt Engineering in VLM


Agent

LLM-based Agent

  • Stanford Town: Generative Agents: Interactive Simulacra of Human Behavior. UIST'2023. paper

VLM-based Agent

  • OSWorld: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Arxiv'2024. paper

Useful-Resource

About

A paper list about large language models and multimodal models (Diffusion, VLM). From foundations to applications. It is only used to record papers for my personal needs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published