BullshitBench: Models Answering Nonsense Questions
simianwords Monday, March 02, 2026
Summary
This article explores the concept of the 'bullshit benchmark,' a new method for evaluating the capabilities of large language models by assessing their ability to generate coherent and sensible responses to prompts designed to be challenging or nonsensical.
1
0
Summary
petergpt.github.io