HomeResearch LibraryGAIA: A Benchmark for General AI Assistants
benchmarkChapter 1ICLR 2024 · 2023

GAIA: A Benchmark for General AI Assistants

Grégoire Mialon (Meta AI), Clémentine Fourrier (Hugging Face)

Abstract

We introduce GAIA, a benchmark for general AI assistants that tests real-world capabilities requiring multi-step reasoning, tool use, and information synthesis. GAIA questions require an average of 5.4 steps to solve.

Key Contributions

  • Real-world multi-step benchmark
  • Tool use evaluation
  • Difficulty stratification

Topics

general AI benchmarkmulti-step reasoningtool usereal-world tasks