This repo includes papers about methods, benchmarks and evaluation for code generation under multimodal scenarios:
- UI Code Generation: Web front-end code generation, Mobile app UI code generation, etc。
- Scientific Code Generation: plot, chart, formula, etc.
- Slide code generation.
- Visually Rich Programming: programming problems with image examples.
- Logo: image generation through svg code generation.
- Program repair under above scenarios.
- UML code generation.
- General Benchmark.
You can directly click on the title to jump to the corresponding PDF link location
-
Design2Code: How Far Are We From Automating Front-End Engineering?. Chenglei Si, Yanzhe Zhang, Zhengyuan Yang, Ruibo Liu, Diyi Yang . Arxiv 2024.
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset. Hugo Laurençon, Léo Tronchon, Victor Sanh . Arxiv 2024.
-
VISION2UI: A Real-World Dataset with Layout for Code Generation from UI Designs. Yi Gui, Zhen Li, Yao Wan, Yemin Shi, Hongyu Zhang, Yi Su, Shaoling Dong, Xing Zhou, Wenbin Jiang . Arxiv 2024.
-
NLDesign: A UI Design Tool for Natural Language Interfaces Tianhao Zhang, Fu Peiguo, Jie Liu, Yihe Zhang, Xingmei Chen . ACM-TURC‘24 (2024.6.30)
-
Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach. Yuxuan Wan, Chaozheng Wang, Yi Dong, Wenxuan Wang, Shuqing Li, Yintong Huo, Michael R. Lyu . Arxiv 2024.
-
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs. Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen . Arxiv 2024.
-
Prototype2Code: End-to-end Front-end Code Generation from UI Design Prototypes Shuhong Xiao, Yunnong Chen, Jiazhi Li, Liuqing Chen, Lingyun Sun, Tingting Zhou . Arxiv 2024.
-
Bridging Design and Development with Automated Declarative UI Code Generation Ting Zhou, Yanjie Zhao, Xinyi Hou, Xiaoyu Sun, Kai Chen, Haoyu Wang . Arxiv 2024.
-
Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping Ryan Li, Yanzhe Zhang, Diyi Yang . Arxiv 2024.
-
Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? Jingyu Xiao, Yuxuan Wan, Yintong Huo, Zhiyao Xu, Michael R.Lyu . Arxiv 2024.
-
UIClip: A Data-driven Model for Assessing User Interface Design Jason Wu, Yi-Hao Peng, Xin Yue Li, Amanda Swearngin, Jeffrey P. Biham, Jeffrey Nichols . UIST 2024.
-
WAFFLE: Multi-Modal Model for Automated Front-End Development Shanchao Liang, Nan Jiang, Shangshu Qian, Lin Tan . Arxiv 2024.
-
MRWeb: An Exploration of Generating Multi-Page Resource-Aware Web Code from UI Designs Yuxuan Wan, Yi Dong, Jingyu Xiao, Yintong Huo, Wenxuan Wang, Michael R. Lyu . Arxiv 2024.
-
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots. Chengyue Wu, Yixiao Ge, Qiushan Guo, Jiahao Wang, Zhixuan Liang, Zeyu Lu, Ying Shan, Ping Luo . Arxiv 2024.
-
MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization. Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun . Arxiv 2024.
-
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation. Chufan Shi, Cheng Yang, Yaxin Liu, Bo Shui, Junjie Wang, Mohan Jing, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, Gongye Liu, Xiaomei Nie, Deng Cai, Yujiu Yang .* Arxiv 2024.
-
From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing. Jingxuan Wei, Cheng Tan, Qi Chen, Gaowei Wu, Siyuan Li, Zhangyang Gao, Linzhuang Sun, Bihui Yu, Ruifeng Guo .* Arxiv 2024.
-
Is GPT-4V (ision) All You Need for Automating Academic Data Visualization? Exploring Vision-Language Models’ Capability in Reproducing Academic Charts. Zhehao Zhang, Weicheng Ma, Soroush Vosoughi .* EMNLP 2024 (Findings).
-
Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code. Timur Galimzyanov, Sergey Titov, Yaroslav Golubev, Egor Bogomolov .* Arxiv 2024.
-
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation. Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun.* Arxiv 2025.1.
-
MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems. Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Jing Ma . Arxiv 2024.
-
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks. Fengji Zhang, Linquan Wu, Huiyu Bai, Guancheng Lin, Xiao Li, Xiao Yu, Yue Wang, Bei Chen, Jacky Keung . Arxiv 2024.
-
DynEx: Dynamic Code Synthesis with Structured Design Exploration for Accelerated Exploratory Programming. Jenny Ma, Karthik Sreedhar, Vivian Liu, Sitong Wang, Pedro Alejandro Perez, Riya Sahni, Lydia B. Chilton . Arxiv 2024.
-
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges. Rao Fu, Ziyang Luo, Hongzhan Lin, Zhen Ye, Jing Ma . Arxiv 2024.
-
StarVector: Generating Scalable Vector Graphics Code from Images and Text Juan A. Rodriguez, Abhay Puri, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, Sai Rajeswar, David Vazquez, Christopher Pal, Marco Pedersoli. Arxiv 2023.
-
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation. Vivian Liu, Rubaiat Habib Kazi, Li-Yi Wei, Matthew Fisher, Timothy Langlois, Seth Walker, Lydia Chilton . Arxiv 2024.
-
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models. Ronghuan Wu, Wanchao Su, Jing Liao . Arxiv 2024.
-
AutoPresent: Designing Structured Visuals from Scratch. Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, Trevor Darrell . Arxiv 2025.
-
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides. Hao Zheng, Xinyan Guan, Hao Kong, Jia Zheng, Hongyu Lin, Yaojie Lu, Ben He, Xianpei Han, Le Sun . Arxiv 2025.
-
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? John Yang, Carlos E. Jimenez, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik R. Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press . Arxiv 2024.
-
DesignRepair: Dual-Stream Design Guideline-Aware Frontend Repair with Large Language Models * Mingyue Yuan, Jieshan Chen, Zhenchang Xing, Aaron Quigley, Yuyu Luo, Gelareh Mohammadi, Qinghua Lu, Liming Zhu.* ICSE 2025.
-
CodeV: Issue Resolving with Visual Data Linhao Zhang, Daoguang Zan, Quanshun Yang, Zhirong Huang, Dong Chen, Bo Shen, Tianyu Liu, Yongshun Gong, Pengjie Huang, Xudong Lu, Guangtai Liang, Lizhen Cui, Qianxiang Wang . Arxiv 2024.
- From Image to UML: First Results of Image-Based UML Diagram Generation using LLMs Arie van Deursen, Eduard C. Groen . LLM4MDE 2024.
-
Image2Struct: Benchmarking Structure Extraction for Vision-Language Models Josselin Somerville Roberts, Tony Lee, Chi Heem Wong, Michihiro Yasunaga, Yifan Mai, Percy Liang . Arxiv 2024.
-
FullStack Bench: Evaluating LLMs as Full Stack Coders Siyao Liu, He Zhu, Jerry Liu, Shulin Xin, Aoyan Li, Rui Long, Li Chen, Jack Yang, Jinxiang Xia, Z.Y. Peng, Shukai Liu, Zhaoxiang Zhang, Jing Mai, Ge Zhang, Wenhao Huang, Kai Shen, Liang Xiang . Arxiv 2024.
-
Empowering Agile-Based Generative Software Development through Human-AI Teamwork Sai Zhang, Zhenchang Xing, Ronghui Guo, Fangzhou Xu, Lei Chen, Zhaoyuan Zhang, Xiaowang Zhang, Zhiyong Feng, Zhiqiang Zhuang . TOSEM 2024.
-
Automated LaTeX Code Generation from Handwritten Mathematical Expressions Jayaprakash Sundararaj, Akhil Vyas, Benjamin Gonzalez-Maldonado. Arxiv 2024.
This is an active repository and your contributions are always welcome! Before you add papers/tools into the awesome list, please make sure that:
- First, think about which category the work should belong to.
- The paper or tools is related to Multimodal Large Language Models (MLLMs) for code generation.
- The paper should be inserted in the correct position in chronological order (publication/arxiv release time).
- The link to paper should be the arxiv page, not the pdf page if this is a paper posted on arxiv.
- If the paper is accpeted, please use the correct publication venue instead of arxiv.