Chapter 3 · 2025
Beyond Task Completion: An Assessment Framework for Evaluating Agentic AI Systems
Sreemaee Akshathala, Bassam Adnan, Mahisha Ramesh
Abstract
This paper proposes an end-to-end Agent Assessment Framework for evaluating agentic AI systems beyond traditional task completion metrics. It addresses the challenges posed by the non-deterministic nature of LLM agents and multi-agent architectures, focusing on four evaluation pillars: LLMs, Memory, Tools, and Environment. The framework aims to capture runtime uncertainties and behavioral deviations overlooked by conventional methods.
Topics
Agentic AIassessment frameworkevaluationLLM agentsmulti-agent systems
Relevance Scores
Long-Horizon Score85
Enterprise Score80
Completeness75
Paper Info
Year2025
Venue
Type
ChapterCh. 3
Authors3
Zone III Analysis
Related Papers
Tree of Thoughts: Deliberate Problem Solving with Large…
2023 · Ch.1
HuggingGPT: Solving AI Tasks with ChatGPT and its Frien…
2023 · Ch.4
Generative Agents: Interactive Simulacra of Human Behav…
2023 · Ch.2
MemGPT: Towards LLMs as Operating Systems
2023 · Ch.2
View all Chapter 3 papers →