Ask HN: Agent evaluations, what is everything I should know?
akira_067 Thursday, November 20, 2025I'm currently building coding agents, and wondering what the standard is for creating and running evals for most people? I gather that the tasks and their definitions will be dramatically different across domains and instances, so I'm not hoping for a one size fits all. Just... what actually works for you in practice?
3
2