A variety of papers related to GUI Agents, including but not limited to:
- GUI Understanding
- Datasets
- Benchmarks
- New frameworks
- New models
- Vision, language, multimodal foundation models (with explicit support for GUI)
- Works in the General Domain extensively used by GUI Agents (e.g., SoM prompting)
- December 7, 2024: Moved the ownership to OSU-NLP-Group
{{insert_content_here}}
- awesome-llm-powered-agent
- Awesome-LLM-based-Web-Agent-and-Tools
- Awesome-GUI-Agent
- computer-control-agent-knowledge-base
- awesome-ui-agent (this repository is based on it to some extent)
TODO: Move the details to the bottom later. Also add a ToC.
- [title](paper link)
- List authors directly without a "key" identifier (e.g., author1, author2)
- 🏛️ Institutions: List the institutions concisely, using abbreviations (e.g., university names, like OSU).
- 📅 Date: e.g., Oct 30, 2024
- 📑 Publisher: ICLR 2025
- 💻 Env: Indicate the research environment within brackets, such as [Web], [Mobile], or [Desktop]. Use [GUI] if the research spans multiple environments. Use [General] if it is researching on general domains.
- 🔑 Key: Label each keyword within brackets, e.g., [model], [framework],[dataset],[benchmark].
- 📖 TLDR: Brief summary of the paper.
Regarding the 🔑 Key:
- model: Indicates a newly trained model.
- framework: If the paper proposes a new framework.
- dataset: If a new dataset is created and published.
- benchmark: If a new benchmark is established (add "dataset" if there's a new training set).
- primary Innovation: List the main focus or innovation in the study.
- Common Abbreviations: Include commonly used abbreviations associated with the paper. (model names, framework names, etc.)
For missing information, use "Unknown."
You can contribute by providing either the paper title or a fully formatted entry in Paper Collection. You’re also welcome to open a new PR with your submission.
For ease of use, you can use this GPTs to help search your paper and format the entry automatically. (prompts in auto_prompt.txt)