Skip to content

[COLING 2025]A curated paper list about LLMs for chemistry

Notifications You must be signed in to change notification settings

OpenDFM/LLM4Chemistry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 

Repository files navigation

LLM4Chemistry

Static Badge This repository collects papers on Large Language Model for Chemistry.

😎 Welcome to recommend missing papers through Adding Issues or Pull Requests.

Contents

Fine-tuning LLM for Chemistry

  • 2022.05 Foundation Models of Scientific Knowledge for Chemistry: Opportunities, Challenges and Lessons Learned. ACL Workshop
  • 2022.11 Galactica: A large language model for science. arXiv
  • 2022.11 Is GPT-3 all you need for machine learning for chemistry? NIPS2022 Workshop
  • 2023.08 Fine-tuning GPT-3 for machine learning electronic and functional properties of organic molecules. Chemical Science
  • 2023.08 HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science. EMNLP2023
  • 2023.10 MatChat: A Large Language Model and Application Service Platform for Materials Science. Chinese Physics B
  • 2024.01 ChemDFM: Dialogue Foundation Model for Chemistry. arXiv
  • 2024.01 Structured information extraction from scientific text with large language models. Nature Communication
  • 2024.02 Leveraging large language models for predictive chemistry. Nature Machine Intelligence
  • 2024.03 SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning. arXiv
  • 2024.03 Domain-Agnostic Molecular Generation with Chemical Feedback. ICLR2024
  • 2024.04 ChemLLM: A Chemical Large Language Model. arXiv
  • 2024.04 BatGPT-Chem: A Foundation Large Model For Chemical Engineering. chemRxiv
  • 2024.04 Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models. ICLR2024
  • 2024.04 LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset. arXiv
  • 2024.05 nach0: Multimodal Natural and Chemical Languages Foundation Model. Chemical Science
  • 2024.06 Fine-tuning large language models for chemical text mining. Chemical Science
  • 2024.06 MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction. arXiv
  • 2024.06 SynAsk: Unleashing the Power of Large Language Models in Organic Synthesis. arXiv
  • 2024.06 PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes. arXiv
  • 2024.09 SciDFM: A Large Language Model with Mixture-of-Experts for Science. arXiv

Multi-Modal Chemistry LLM

  • 2023.03 Uni-Mol: A Universal 3D Molecular Representation Learning Framework. ICLR
  • 2023.05 DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs. arXiv
  • 2023.06 MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. EMNLP2023
  • 2023.06 MolFM: A Multimodal Molecular Foundation Model. arXiv
  • 2023.08 BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine. arXiv
  • 2023.09 3D-MOLM: TOWARDS 3D MOLECULE-TEXT INTERPRETATION IN LANGUAGE MODELS. ICLR2024
  • 2023.11 InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery. arXiv
  • 2023.12 MoleculeGPT: Instruction Following Large Language Models for Molecular Property Prediction. NIPS Workshop
  • 2024.01 MolTC: Towards Molecular Relational Modeling In Language Models ACL2024
  • 2024.01 ReactXT: Understanding Molecular “Reaction-ship” viaReaction-Contextualized Molecule-Text Pretraining. ACL2024
  • 2024.03 GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text. arXiv
  • 2024.06 HIGHT: Hierarchical Graph Tokenization for Graph-Language Alignment. arXiv
  • 2024.06 3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization. arXiv
  • 2024.06 MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension. arXiv
  • 2024.07 MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations. Bioinformatics
  • 2024.08 UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation. arXiv
  • 2024.08 ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area. arXiv
  • 2024.09 ChemDFM-X: Towards Large Multimodal Model for Chemistry. arXiv

LLM as A Chemistry Agent

  • 2023.09 Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design. ACS Engineering Au
  • 2023.10 Large language models for chemistry robotics. Autonomous Robots
  • 2023.10 Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design. EMNLP2023
  • 2023.11 Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis. arXiv
  • 2023.12 Autonomous chemical research with large language models. Nature
  • 2024.01 Structured Chemistry Reasoning with Large Language Models. ICML2024
  • 2024.01 ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback. ICML2024
  • 2024.02 An Autonomous Large Language Model Agent for Chemical Literature Data Mining. arXiv
  • 2024.03 From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery. AAAI2024
  • 2024.03 DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs. arXiv
  • 2024.04 Integrating Chemistry Knowledge in Large Language Models via Prompt Engineering. arXiv
  • 2024.04 Large Language Models are In-Context Molecule Learners. arXiv
  • 2024.04 A Self-feedback Knowledge Elicitation Approach for Chemical Reaction Predictions. arXiv
  • 2024.04 Large Language Models Open New Way of AI-Assisted Molecule Design for Chemists. ChemRxiv
  • 2024.05 Augmenting large language models with chemistry tools. Nature Machine Intelligence
  • 2024.05 ChatMOF: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models. Nature Communications
  • 2024.06 LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation. arXiv

LLM Chemistry Benchmark

  • 2017.09 Crowdsourcing multiple choice science questions. ACL Workshop
  • 2020.09 ChemistryQA: A Complex Question Answering Dataset from Chemistry. OpenReview
  • 2023.01 Assessment of chemistry knowledge in large language models that generate code. Digital Discovery
  • 2023.03 Do Large Language Models Understand Chemistry? A Conversation with ChatGPT. Journal of Chemical Information and Modeling
  • 2023.06 Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective. TKDE
  • 2023.07 Can Large Language Models Empower Molecular Property Prediction? arXiv
  • 2023.10 ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction. arXiv
  • 2023.10 GPT-MolBERTa: GPT Molecular Features Language Model for molecular property. arXiv
  • 2023.12 What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks. NeurIPS2023
  • 2023.12 SciMT-Safety: Control Risk for Potential Misuse of Artificial Intelligence in Science. arXiv
  • 2024.01 SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research. AAAI2024
  • 2024.01 SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis. arXiv
  • 2024.02 Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science. arXiv
  • 2024.02 Building a Dataset for Language+Molecules. arXiv
  • 2024.03 Benchmarking Large Language Models for Molecule Prediction Tasks. arXiv
  • 2024.03 MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension. arXiv
  • 2024.03 Benchmarking Large Language Models for Molecule Prediction Tasks. arXiv
  • 2024.02 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models. arXiv
  • 2024.04 Are large language models superhuman chemists? arXiv
  • 2024.06 SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models. arXiv
  • 2024.07 ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering. arXiv
  • 2024.09 VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning. arXiv
  • 2024.09 ChemEval: A Comprehensive Multi-Level Chemical Evalution for Large Language Models. arXiv
  • 2024.10 Unraveling Molecular Structure: A Multimodal Spectroscopic Dataset for Chemistry. NIPS2024
  • 2024.10 MassSpecGym: A benchmark for the discovery and identification of molecules. NIPS2024
  • 2024.10 Can LLMs Solve Molecule Puzzles? A Multimodal Benchmark for Molecular Structure Elucidation. NIPS2024
  • 2024.10 DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials. NIPS2024
  • 2024.12 TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation. arXiv

Related Works

  • 2023.04 A Systematic Survey of Chemical Pre-trained Models. IJCAI2023
  • 2023.09 Large Language Models in Molecular Discovery. NIPS2023 Workshop
  • 2024.01 Scientific Large Language Models: A Survey on Biological & Chemical Domains. arXiv
  • 2024.01 From Words to Molecules: A Survey of Large Language Models in Chemistry. IJCAI2024
  • 2024.03 Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule. arXiv
  • 2024.03 Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey. arXiv
  • 2024.06 A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery. arXiv
  • 2024.07 A Review of Large Language Models and Autonomous Agents in Chemistry. arXiv

About

[COLING 2025]A curated paper list about LLMs for chemistry

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •