Chapter 7 · 2023
Evaluating Language-Model Agents on Realistic Autonomous Tasks
Megan Kinniment, Lucas Jun Koba Sato, Haoxing Du
Abstract
We evaluate language model agents on 12 realistic autonomous tasks requiring multi-step reasoning and real-world tool use, finding that current models succeed on only a small fraction of tasks.
Topics
autonomous tasksevaluationreal-worldmulti-step reasoning
Relevance Scores
Long-Horizon Score87
Enterprise Score83
Completeness84
Paper Info
Year2023
Venue
Type
ChapterCh. 7
Authors3
Zone III Analysis
Related Papers
Reflexion: Language Agents with Verbal Reinforcement Le…
2023 · Ch.1
Tree of Thoughts: Deliberate Problem Solving with Large…
2023 · Ch.1
Generative Agents: Interactive Simulacra of Human Behav…
2023 · Ch.2
MemGPT: Towards LLMs as Operating Systems
2023 · Ch.2
View all Chapter 7 papers →