Chapter 7 · 2023
AgentBench: Evaluating LLMs as Agents
Xiao Liu, Hao Yu, Hanchen Zhang
Abstract
We present AgentBench, a multi-dimensional evolving benchmark to evaluate LLMs as agents in various environments including operating systems, databases, knowledge graphs, digital games, and web browsing.
Topics
agent benchmarksevaluationLLM agentsmulti-environment
Relevance Scores
Long-Horizon Score82
Enterprise Score78
Completeness88
Paper Info
Year2023
Venue
Type
ChapterCh. 7
Authors3
Zone III Analysis
Related Papers
Reflexion: Language Agents with Verbal Reinforcement Le…
2023 · Ch.1
Tree of Thoughts: Deliberate Problem Solving with Large…
2023 · Ch.1
Generative Agents: Interactive Simulacra of Human Behav…
2023 · Ch.2
MemGPT: Towards LLMs as Operating Systems
2023 · Ch.2
View all Chapter 7 papers →