Hazumi News | Show HN: Regrada – The CI gate for LLM behavior

I built Regrada to help me with prompt changes.

Working on LLM-based applications led me to discover 2 big pain points in my opinion:

1. it's difficult to monitor how a prompt change might break behavior. 2. testing SDKs are difficult/high friction to actually integrate and run.

Regrada solves this by intercepting LLM calls, to then build traces and baselines we can compare against and create CI gates to make sure behavior drifts don't reach prod.

Some cool features we've built:

`--explain`: when a case fails after a model change, an LLM helps you detect why the behavior shifted. Not just "assertion failed" but "the model is now truncating before the conclusion clause." Saves a lot of digging to diagnose.

`regrada fuzz`: runs mutations on your inputs (typos, reorderings, edge cases) to find cases where your prompt is more brittle than you think. Caught a production issue for me before launch.

You can run fully local or connect to the cloud runner.

Still pre-launch, actively looking for teams to try it.

Happy to answer questions about how the assertion layer works, the model-agnostic design, or anything else.

https://www.regrada.com/

Story

Show HN: Regrada – The CI gate for LLM behavior