LLM-and-VLM-Paper-List

A paper list about large language models and multi-modal models.
Note: It only records papers for my personal needs. It is welcome to open an issue if you think I missed some important or exciting work!

Survey

HELM: Holistic evaluation of language models. TMLR. paper
HEIM: Holistic Evaluation of Text-to-Image Models. NeurIPS'2023. paper
Eval Survey: A Survey on Evaluation of Large Language Models. Arxiv'2023. paper
Healthcare LM Survey: A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics. Arxiv'2023. paper, github
Multimodal LLM Survey: A Survey on Multimodal Large Language Model. Arxiv'2023. paper, github
VLM for vision Task Survey: Vision Language Models for Vision Tasks: A Survey. Arxiv'2023. paper, github
Efficient LLM Survey: Efficient Large Language Models: A Survey. Arxiv'2023. paper, github
Prompt Engineering Survey: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. Arxiv'2021. paper
Multimodal Safety Survey: Safety of Multimodal Large Language Models on Images and Text. Arxiv'2024. paper
Multimodal LLM Recent Survey: MM-LLMs: Recent Advances in MultiModal Large Language Models. Arxiv'2024. paper
Prompt Engineering in LLM Survey: A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. Arxiv'2024. paper
LLM Security and Privacy Survey: A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly. Arxiv'2024. paper
LLM Privacy Survey: Privacy in Large Language Models: Attacks, Defenses and Future Directions. Arxiv'2023. paper

Language Model

Foundation LM Models

Transformer: Attention Is All You Need. NIPS'2017. paper
GPT-1: Improving Language Understanding by Generative Pre-Training. 2018. paper
BERT: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL'2019. paper
GPT-2: Language Models are Unsupervised Multitask Learners. 2018. paper
RoBERTa: RoBERTa: A Robustly Optimized BERT Pretraining Approach. Arxiv'2019, paper
DistilBERT: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Arxiv'2019. paper
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR'2020. paper
GPT-3: Language Models are Few-Shot Learners. NeurIPS'2020. paper
GLaM: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. ICML'2022. paper
PaLM: PaLM: Scaling Language Modeling with Pathways. ArXiv'2022. paper
BLOOM: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. Arxiv'2022. paper
BLOOMZ: Crosslingual Generalization through Multitask Finetuning. Arxiv'2023. paper
LLaMA: LLaMA: Open and Efficient Foundation Language Models. Arxiv'2023. paper
GPT-4: GPT-4 Technical Report. Arxiv'2023. paper
PaLM 2: PaLM 2 Technical Report. 2023. paper
LLaMA 2: Llama 2: Open foundation and fine-tuned chat models. Arxiv'2023. paper
Mistral: Mistral 7B. Arxiv'2023. paper
Phi1: Project Link
Phi1.5: Project Link
Phi2: Project Link
Falcon: Project Link

RLHF

PPO: Proximal Policy Optimization Algorithms. Arxiv'2017. paper
DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS'2023. paper

Parameter Efficient Fine-tuning

LoRA: LoRA: Low-Rank Adaptation of Large Language Models. Arxiv'2021. paper
Q-LoRA: QLoRA: Efficient Finetuning of Quantized LLMs. NeurIPS'2023. paper

Healthcare LM

Med-PaLM: Large Language Models Encode Clinical Knowledge. Arxiv'2022. paper
MedAlpaca: MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data. Arxiv'2023. paper
Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models. Arxiv'2023. paper
HuatuoGPT: HuatuoGPT, towards Taming Language Model to Be a Doctor. EMNLP'2023(findings). paper
GPT-4-Med: Capabilities of GPT-4 on Medical Challenge Problems. Arxiv'2023. paper

Watermarking LLM

Prompt Engineering in LLM

Hard Prompt

PET: Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. EACL'2021. paper
Making Pre-trained Language Models Better Few-shot Learners. ACL'2021. paper

Soft Prompt

Prompt-Tuning:The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP'2021 [paper]
Prefix-Tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation. ACL'2021. paper
P-tuning: P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. ACL'2022. paper
P-tuning v2: P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. Arxiv'2022. Paper

Between Soft and Hard

Auto-Prompt: AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. EMNLP'2020. paper
FluentPrompt: Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?. EMNLP'2023 (findings). paper
PEZ: Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. Arxiv'2023. paper

Multi-modal Models

Foundation Multi-Modal Models

CLIP: Learning Transferable Visual Models From Natural Language Supervision. ICML'2021. paper
DeCLIP: Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. ICLR'2022. paper
FILIP: FILIP: Fine-grained Interactive Language-Image Pre-Training. ICLR'2022. paper
Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models. CVPR'2022. paper
BLIP: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. ICML'2022. paper
BLIP2: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ICML'2023. paper
LLaMA-Adapter: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. Arxiv'2023. paper
LLaVA: Visual Instruction Tuning. NeurIPS'2023. paper
Instruct BLIP: InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. NeurIPS'2023. paper

Multi-modal Safety

SLD: Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. CVPR'2023. paper
ESD: Erasing Concepts from Diffusion Models. ICCV'2023. paper

VLM Hullucinatins

POPE: Evaluating Object Hallucination in Large Vision-Language Models. EMNLP'2023. paper
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models. CVPR'2024. paper

VLM Privacy

Prompt Engineering in VLM

Agent

LLM-based Agent

Stanford Town: Generative Agents: Interactive Simulacra of Human Behavior. UIST'2023. paper

VLM-based Agent

OSWorld: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Arxiv'2024. paper

Useful-Resource

Hugging Face course. https://huggingface.co/learn
LLaMA Factory. https://github.com/hiyouga/LLaMA-Factory
DeepSpeed. https://github.com/microsoft/DeepSpeed
trlx. https://github.com/CarperAI/trlx
Prompt Engineering Update. https://github.com/thunlp/PromptPapers

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-and-VLM-Paper-List

Table of Contents

Survey

Language Model

Foundation LM Models

RLHF

Parameter Efficient Fine-tuning

Healthcare LM

Watermarking LLM

Prompt Engineering in LLM

Hard Prompt

Soft Prompt

Between Soft and Hard

Multi-modal Models

Foundation Multi-Modal Models

Multi-modal Safety

VLM Hullucinatins

VLM Privacy

Prompt Engineering in VLM

Agent

LLM-based Agent

VLM-based Agent

Useful-Resource

About

Releases

Packages

Lukcy-ML/LLM-and-Multimodal-Paper-List

Folders and files

Latest commit

History

Repository files navigation

LLM-and-VLM-Paper-List

Table of Contents

Survey

Language Model

Foundation LM Models

RLHF

Parameter Efficient Fine-tuning

Healthcare LM

Watermarking LLM

Prompt Engineering in LLM

Hard Prompt

Soft Prompt

Between Soft and Hard

Multi-modal Models

Foundation Multi-Modal Models

Multi-modal Safety

VLM Hullucinatins

VLM Privacy

Prompt Engineering in VLM

Agent

LLM-based Agent

VLM-based Agent

Useful-Resource

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages