Show HN: Watch LLMs play 21,000 hands of Poker
jazarwil Thursday, January 08, 2026PokerBench is my attempt at a new LLM benchmark wherein frontier models play Texas Hold'em in an arena setting. It also features a simulator to view individual games and observe how the different models reason about poker strategy. Opus/Haiku, Gemini Pro/Flash, GPT-5.2/5 mini, and Grok 4.1 Fast Reasoning have all been included.
All code -> https://github.com/JoeAzar/pokerbench
Summary
The article discusses the performance of large language models on the PokerBench, a benchmark for evaluating the abilities of AI systems in playing Texas Hold'em poker. It presents the results of running various large models, including GPT-3, on this benchmark and analyzes their performance across different metrics.
29
18
Summary
pokerbench.adfontes.io