Task
General
APEX–Agents APEX–Agents is a benchmark from Mercor for evaluating whether AI agents can execute long-horizon, cross-application professional services tasks. Tasks were created by investment banking analysts, management consultants, and corporate lawyers, and require agents to navigate realistic work environments with files and tools (e.g., docs, spreadsheets, PDFs, email, chat, calendar). Tasks: 480 total (160 per job category) Worlds: 33 total (10 banking, 11 consulting, 12… See the full description on the dataset page: https://huggingface.co/datasets/mercor/apex-agents.