Chapter 6 · 2026
Agent Benchmarks Fail Public Sector Requirements
Jonathan Rystrøm, Chris Schmitz, Karolina Korgul
Abstract
This paper argues that existing benchmarks for LLM agents fail to meet the stringent legal, procedural, and structural requirements of the public sector. It defines criteria for public sector-relevant benchmarks, including process-based, realistic, public-sector-specific, and metrics-driven. An analysis of over 1,300 benchmark papers reveals that no single benchmark meets all these criteria, calling for new research and application of these criteria by public-sector officials.
Topics
LLM agentsbenchmarkingpublic sectorevaluation criteriagovernance
Relevance Scores
Long-Horizon Score85
Enterprise Score80
Completeness75
Paper Info
Year2026
Venue
Type
ChapterCh. 6
Authors3
Zone III Analysis
Frameworks
Related Papers
Reflexion: Language Agents with Verbal Reinforcement Le…
2023 · Ch.1
AgentBench: Evaluating LLMs as Agents
2023 · Ch.1
A Survey on Large Language Model based Autonomous Agent…
2023 · Ch.1
Semantic Uncertainty: Linguistic Invariances for Uncert…
2023 · Ch.3
View all Chapter 6 papers →