
Evaluating Coding Agents with Terminal-Bench 2.0
vinhnx
snorkel.ai
snorkel.ai2 points0 comments
Summary
by metafa.stThe article discusses Snorkel AI's development of the Terminal Bench, a benchmark for evaluating the capabilities of coding agents, and the company's role in building the next generation of benchmarks for advanced language models.