Show HN: NadirClaw, LLM router that cuts costs by routing prompts right

I use Claude and Codex heavily for coding, and I kept burning through my quota halfway through the week. When I looked at my logs, most of my prompts were things like "summarize this," "reformat this JSON," or "write a docstring." Stuff that any small model handles fine.

So I built NadirClaw. It's a Python proxy that sits between your app and your LLM providers. It classifies each prompt in about 10ms and routes simple ones to Gemini Flash, Ollama, or whatever cheap/local model you want. Only the complex prompts hit your premium API.

It's OpenAI-compatible, so you just point your existing tools at it. Works with OpenClaw, Cursor, Claude Code, or anything that talks to the OpenAI API.

In practice I went from burning through my Claude quota in 2 days to having it last the full week. Costs dropped around 60%.

curl -fsSL https://raw.githubusercontent.com/doramirdor/NadirClaw/main/... | sh

Still early. The classifier is simple (token count + pattern matching + optional embeddings), and I'm sure there are edge cases I'm missing. Curious what breaks first, and whether the routing logic makes sense to others.

Repo: https://github.com/doramirdor/NadirClaw

Story

Show HN: NadirClaw, LLM router that cuts costs by routing prompts right