Story

Measuring AI Ability to Complete Long Tasks: Opus 4.5 has 50% horizon of 4h49M

spicypete Sunday, December 21, 2025
Summary
The article discusses a new approach to measuring the ability of AI systems to complete long-term tasks, focusing on factors like task complexity, reasoning ability, and adaptability. The proposed framework aims to provide a more comprehensive evaluation of AI capabilities beyond traditional benchmarks.
109 74
Summary
metr.org
Visit article Read on Hacker News Comments 74