benchmarkChapter 1ICLR 2024 · 2023
GAIA: A Benchmark for General AI Assistants
Grégoire Mialon (Meta AI), Clémentine Fourrier (Hugging Face)
Abstract
We introduce GAIA, a benchmark for general AI assistants that tests real-world capabilities requiring multi-step reasoning, tool use, and information synthesis. GAIA questions require an average of 5.4 steps to solve.
Key Contributions
- →Real-world multi-step benchmark
- →Tool use evaluation
- →Difficulty stratification
Topics
general AI benchmarkmulti-step reasoningtool usereal-world tasks
Relevance Scores
Long-Horizon Score90
Enterprise Score84
Completeness84
Paper Info
Year2023
VenueICLR 2024
Typebenchmark
ChapterCh. 1
Authors2
Related Papers
ReAct: Synergizing Reasoning and Acting in Language Mod…
2023 · Ch.1
Reflexion: Language Agents with Verbal Reinforcement Le…
2023 · Ch.1
Tree of Thoughts: Deliberate Problem Solving with Large…
2023 · Ch.1
Toolformer: Language Models Can Teach Themselves to Use…
2023 · Ch.1
View all Chapter 1 papers →