GAIA Benchmark
GAIA is "a benchmark for General AI Assistants," published in 2023 (arXiv 2311.12983). It tests general-purpose AI agent capability across reasoning, tool use, multi-modality, and web browsing, with a public leaderboard hosted on Hugging Face for community submissions. The benchmark has become a reference point for evaluating agentic systems that combine an LLM with tools and a browser.