Measuring AI Ability to Complete Long Tasks: Opus 4.5 has 50% horizon of 4h49M
spicypete Sunday, December 21, 2025
Summary
The article discusses a new approach to measuring the ability of AI systems to complete long-term tasks, focusing on factors like task complexity, reasoning ability, and adaptability. The proposed framework aims to provide a more comprehensive evaluation of AI capabilities beyond traditional benchmarks.
109
74
Summary
metr.org