Evaluating Coding Agents with Terminal-Bench 2.0

vinhnx264 days ago

snorkel.ai

2 points0 comments

Summary

The article discusses Snorkel AI's development of the Terminal Bench, a benchmark for evaluating the capabilities of coding agents, and the company's role in building the next generation of benchmarks for advanced language models.

Read full article View on HN