Chapter 6 · 2024
Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation
Yuxuan Liu, Tianchi Yang, Shaohan Huang
Abstract
We study whether LLMs can be trusted as evaluators, finding systematic biases including position bias, verbosity bias, and self-enhancement bias that affect evaluation reliability.
Topics
LLM evaluationbiasmeta-evaluationreliability
Relevance Scores
Long-Horizon Score78
Enterprise Score82
Completeness83
Paper Info
Year2024
Venue
Type
ChapterCh. 6
Authors3
Zone III Analysis
Frameworks
Related Papers
Reflexion: Language Agents with Verbal Reinforcement Le…
2023 · Ch.1
AgentBench: Evaluating LLMs as Agents
2023 · Ch.1
Semantic Uncertainty: Linguistic Invariances for Uncert…
2023 · Ch.3
LLM-as-a-Judge: Large Language Models as Evaluators
2023 · Ch.5
View all Chapter 6 papers →