Story

Show HN: Watch LLMs play 21,000 hands of Poker

jazarwil Thursday, January 08, 2026

PokerBench is my attempt at a new LLM benchmark wherein frontier models play Texas Hold'em in an arena setting. It also features a simulator to view individual games and observe how the different models reason about poker strategy. Opus/Haiku, Gemini Pro/Flash, GPT-5.2/5 mini, and Grok 4.1 Fast Reasoning have all been included.

All code -> https://github.com/JoeAzar/pokerbench

Summary
The article discusses the performance of large language models on the PokerBench, a benchmark for evaluating the abilities of AI systems in playing Texas Hold'em poker. It presents the results of running various large models, including GPT-3, on this benchmark and analyzes their performance across different metrics.
29 18
Summary
pokerbench.adfontes.io
Visit article Read on Hacker News Comments 18