Story

Show HN: yolo-cage – AI coding agents that can't exfiltrate secrets

borenstein Wednesday, January 21, 2026

I made this for myself, and it seemed like it might be useful to others. I'd love some feedback, both on the threat model and the tool itself. I hope you find it useful!

Backstory: I've been using many agents in parallel as I work on a somewhat ambitious financial analysis tool. I was juggling agents working on epics for the linear solver, the persistence layer, the front-end, and planning for the second-generation solver. I was losing my mind playing whack-a-mole with the permission prompts. YOLO mode felt so tempting. And yet.

Then it occurred to me: what if YOLO mode isn't so bad? Decision fatigue is a thing. If I could cap the blast radius of a confused agent, maybe I could just review once. Wouldn't that be safer?

So that day, while my kids were taking a nap, I decided to see if I could put YOLO-mode Claude inside a sandbox that blocks exfiltration and regulates git access. The result is yolo-cage.

Also: the AI wrote its own containment system from inside the system's own prototype. Which is either very aligned or very meta, depending on how you look at it.

Summary
The article describes a deep learning model called YOLO-Cage that can detect and localize Nicolas Cage's face in images. The model was trained on a dataset of images containing Cage's face and achieves high accuracy in identifying his likeness.
36 56
Summary
github.com
Visit article Read on Hacker News Comments 56