Highlights
- Pro
Pinned Loading
-
xlang-ai/OSWorld
xlang-ai/OSWorld Public[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
-
xlang-ai/Spider2-V
xlang-ai/Spider2-V Public[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
-
xlang-ai/Spider2
xlang-ai/Spider2 PublicSpider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
-
-
yiyihum/da-code
yiyihum/da-code Public[EMNLP 2024] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.