Hazumi News | Show HN: Agent Tinman – Autonomous failure discovery for LLM systems

Hey HN,

I built Tinman because finding LLM failures in production is a pain in the ass. Traditional testing checks what you've already thought of. Tinman tries to find what you haven't.

It's an autonomous research agent that: - Generates hypotheses about potential failure modes - Designs and runs experiments to test them - Classifies failures (reasoning errors, tool use, context issues, etc.) - Proposes interventions and validates them via simulation

The core loop runs continuously. Each cycle informs the next.

Why now: With tools like OpenClaw/ClawdBot giving agents real system access, the failure surface is way bigger than "bad chatbot response." Tinman has a gateway adapter that connects to OpenClaw's WebSocket stream for real-time analysis as requests flow through.

Three modes: - LAB: unrestricted research against dev - SHADOW: observe production, flag issues - PRODUCTION: human approval required

Tech: - Python, async throughout - Extensible GatewayAdapter ABC for any proxy/gateway - Memory graph for tracking what was known when - Works with OpenAI, Anthropic, Ollama, Groq, OpenRouter, Together

  pip install AgentTinman
  tinman init && tinman tui

GitHub: https://github.com/oliveskin/Agent-Tinman Docs: https://oliveskin.github.io/Agent-Tinman/ OpenClaw adapter: https://github.com/oliveskin/tinman-openclaw-eval

Apache 2.0. No telemetry, no paid tier. Feedback and contributions welcome.

Summary

Agent Tinman is a chatbot that aims to provide helpful and engaging conversations while maintaining strong ethical principles. The project explores the development of AI assistants that can balance empathy, honesty, and the wellbeing of users.

Story

Show HN: Agent Tinman – Autonomous failure discovery for LLM systems