Task
Reinforcement Learning
Gaia2 is a benchmark dataset for evaluating AI agent capabilities in simulated environments. The dataset contains 800 scenarios that test agent performance in environments where time flows continuously and events occur dynamically.
Gaia2 is a benchmark dataset for evaluating AI agent capabilities in simulated environments. The dataset contains 800 scenarios that test agent performance in environments where time flows continuously and events occur dynamically.