Show HN: Steadwing – Your Autonomous On-Call Engineer
abejith Friday, March 06, 2026Hey HN! We’re Abejith and Dev, and we’re building Steadwing (https://www.steadwing.com) - an autonomous on-call engineer that diagnoses production incidents/alerts, correlates evidence across your stack, and resolves them. You can try it at https://app.steadwing.com/signup (no credit card required and a demo mode is available).
Every on-call engineer knows the pain. It’s 2am, PagerDuty fires, you open the laptop and start the scramble - Datadog for metrics, GitHub for recent commits, Slack to see who’s awake, Elasticsearch for logs. 45 minutes later you find it was a config change that reduced the connection pool size. The fix took 2 minutes. The diagnosis took almost an hour.
The problem isn’t fixing things, it’s the correlation. The signal is scattered across a dozen tools and nobody has the full picture. My co-founder, Dev, and I met through Entrepreneurs First and both felt that incident response was fundamentally broken and could be significantly improved, with a long-term vision of making software self-healing.
So we built Steadwing. When an alert fires, it pulls context simultaneously from logs, metrics, traces and recent commits - correlates the signals, and delivers a structured RCA in under 5 minutes with plain-language root cause, evidence linked back to source tools, a timeline, impact assessment, and both short-term and long-term fixes.
For noisy environments: say a bad deploy causes cascading failures across 5 microservices and triggers 30+ alerts. Steadwing groups them into one incident and tells you what the actual root cause is vs. what’s just a side effect. It doesn’t just diagnose - it suggests safe fixes ranked by risk, and can handle rollbacks, scaling adjustments, and config changes for you. You can also ask follow-up questions about any incident or general infra questions conversationally.
All 20+ integrations (Datadog, PagerDuty, Slack, GitHub, Sentry, AWS, K8s, etc.) connect via OAuth or API Key - no agents, no code changes, live in a few seconds. We also built an MCP server so AI coding agents can interact with Steadwing from your dev environment, and we open-sourced OpenAlerts (https://github.com/steadwing/openalerts, https://openalerts.dev) - a monitoring layer for agentic frameworks with real-time alert rules for LLM errors, infra failures, stuck sessions, and queue buildup, with multi-channel notifications via Slack, Discord, and Telegram.
We have a free tier and would love feedback, especially from folks who are on-call regularly.
Let us know what works, what’s missing, and what you’d want next :)