Show HN: Channel Surfer – Watch YouTube like it’s cable TV
I know, it's a very first-world problem. But in my house, we have a hard time deciding what to watch. Too many options!
So I made this to recreate Cable TV for YouTube. I made it so it runs in the browser. Quickly import your subscriptions in the browser via a bookmarklet. No accounts, no sign-ins. Just quickly import your data locally.
Show HN: Context Gateway – Compress agent context before it hits the LLM
We built an open-source proxy that sits between coding agents (Claude Code, OpenClaw, etc.) and the LLM, compressing tool outputs before they enter the context window.
Demo: https://www.youtube.com/watch?v=-vFZ6MPrwjw#t=9s.
Motivation: Agents are terrible at managing context. A single file read or grep can dump thousands of tokens into the window, most of it noise. This isn't just expensive — it actively degrades quality. Long-context benchmarks consistently show steep accuracy drops as context grows (OpenAI's GPT-5.4 eval goes from 97.2% at 32k to 36.6% at 1M https://openai.com/index/introducing-gpt-5-4/).
Our solution uses small language models (SLMs): we look at model internals and train classifiers to detect which parts of the context carry the most signal. When a tool returns output, we compress it conditioned on the intent of the tool call—so if the agent called grep looking for error handling patterns, the SLM keeps the relevant matches and strips the rest.
If the model later needs something we removed, it calls expand() to fetch the original output. We also do background compaction at 85% window capacity and lazy-load tool descriptions so the model only sees tools relevant to the current step.
The proxy also gives you spending caps, a dashboard for tracking running and past sessions, and Slack pings when an agent is sitting there waiting on you.
Repo is here: https://github.com/Compresr-ai/Context-Gateway. You can try it with:
curl -fsSL https://compresr.ai/api/install | sh
Happy to go deep on any of it: the compression model, how the lazy tool loading works, or anything else about the gateway. Try it out and let us know how you like it!
Show HN: Svglib a SVG parser and renderer for Windows
svglib is a SVG file parser and renderer library for Windows. It uses Direct2D for GPU assisted rendering and XMLLite for XML parsing.
This is meant for Win32 applications and games to easily display SVG images.
Show HN: Tiny macOS app that adds a facecam bubble to screen recordings
Show HN: What was the world listening to? Music charts, 20 countries (1940–2025)
I built this because I wanted to know what people in Japan were listening to the year I was born. That question spiraled: how does a hit in Rome compare to what was charting in Lagos the same year? How did sonic flavors propagate as streaming made musical influence travel faster than ever? 88mph is a playable map of music history: 230 charts across 20 countries, spanning 8 decades (1940–2025). Every song is playable via YouTube or Spotify. It's open source and I'd love help expanding it — there's a link to contribute charts for new countries and years. The goal is to crowdsource a complete sonic atlas of the world.
Show HN: Mjmx – render mjml using JSX
Hey HN!
I have been working with mjml and handlebars for a very long time, but I really miss the type-safety of JSX syntax paired with Typescript and component composition. So I wanted to to combine mjml with jsx. There are libraries like mjml-react or react.email, but for no apparent reason, they seem to depend on react.
So I decided to create mjmx[0] - a standalone, zero dependencies (other than mjml), custom jsx runtime for rendering mjml. Appreciate if you would give a try and provide some feedback.
[0] https://mjmx.dev/
Show HN: AgentLog – a lightweight event bus for AI agents using JSONL logs
I’ve been experimenting with infrastructure for multi-agent systems.
I built a small project called AgentLog.
The core idea is very simple, topics are just append-only JSONL files.
Agents publish events over HTTP and subscribe to streams using SSE.
The system is intentionally single-node and minimal for now.
Future ideas I’m exploring: - replayable agent workflows - tracing reasoning across agents - visualizing event timelines - distributed/federated agent logs
Curious if others building agent systems have run into similar needs.
Show HN: Execute local LLM prompts in remote SSH shell sessions
Hi HN,
This is a tool I've worked on the past few months.
Instead of giving LLM tools SSH access or installing them on a server, the following command:
$ promptctl ssh user@server
makes a set of locally defined prompts "magically" appear within the remote shell as executable command line programs.For example, I have locally defined prompts for `llm-analyze-config` and `askai`. Then on (any) remote host I can:
$ promptctl ssh user@host
# Now on remote host
$ llm-analyze-config /etc/nginx.conf
$ cat docker-compose.yml | askai "add a load balancer"
the prompts behind `llm-analyze-config` and `askai` execute on my local computer (even though they're invoked remotely) via the llm of my choosing.This way LLM tools are never granted SSH access to the server, and nothing needs to be installed to the server. In fact, the server does not even need outbound internet connections to be enabled.
Eager to get feedback!
Github: https://github.com/tgalal/promptcmd/
Show HN: AI milestone verification for construction using AWS
Hi HN,
I built Build4Me to address a trust problem in diaspora-funded construction projects.
Many families send money home to build houses but have no reliable way to verify that work is actually being done. Photos can be reused, progress exaggerated, or projects abandoned after funds are sent.
Build4Me introduces milestone-based funding where each construction milestone must be verified before funds are released.
The system verifies progress using: - geotagged photo capture - GPS location verification - AI image analysis - duplicate image detection
It runs on serverless AWS architecture using services like Rekognition, Bedrock, Lambda, DynamoDB, and Amazon Location Service.
Would love feedback on the architecture and fraud detection approach.
Show HN: RepoCrunch – CLI to analyze GitHub repos
RepocrunchReplies to user comments on GitHub repositories, providing a centralized location for discussions and improving repository management.
Show HN: Axe – A 12MB binary that replaces your AI framework
I built Axe because I got tired of every AI tool trying to be a chatbot.
Most frameworks want a long-lived session with a massive context window doing everything at once. That's expensive, slow, and fragile. Good software is small, focused, and composable... AI agents should be too.
Axe treats LLM agents like Unix programs. Each agent is a TOML config with a focused job. Such as code reviewer, log analyzer, commit message writer. You can run them from the CLI, pipe data in, get results out. You can use pipes to chain them together. Or trigger from cron, git hooks, CI.
What Axe is:
- 12MB binary, two dependencies. no framework, no Python, no Docker (unless you want it)
- Stdin piping, something like `git diff | axe run reviewer` just works
- Sub-agent delegation. Where agents call other agents via tool use, depth-limited
- Persistent memory. If you want, agents can remember across runs without you managing state
- MCP support. Axe can connect any MCP server to your agents
- Built-in tools. Such as web_search and url_fetch out of the box
- Multi-provider. Bring what you love to use.. Anthropic, OpenAI, Ollama, or anything in models.dev format
- Path-sandboxed file ops. Keeps agents locked to a working directory
Written in Go. No daemon, no GUI.
What would you automate first?
Show HN: OpenClaw docs in Japanese, now open source
Show HN: OneCLI – Vault for AI Agents in Rust
We built OneCLI because AI agents are being given raw API keys. And it's going about as well as you'd expect. We figured the answer isn't "don't give agents access," it's "give them access without giving them secrets."
OneCLI is an open-source gateway that sits between your AI agents and the services they call. You store your real credentials once in OneCLI's encrypted vault, and give your agents placeholder keys. When an agent makes an HTTP call through the proxy, OneCLI matches the request by host/path, verifies the agent should have access, swaps the placeholder for the real credential, and forwards the request. The agent never touches the actual secret. It just uses CLI or MCP tools as normal.
Try it in one line: docker run --pull always -p 10254:10254 -p 10255:10255 -v onecli-data:/app/data ghcr.io/onecli/onecli
The proxy is written in Rust, the dashboard is Next.js, and secrets are AES-256-GCM encrypted at rest. Everything runs in a single Docker container with an embedded Postgres (PGlite), no external dependencies. Works with any agent framework (OpenClaw, NanoClaw, IronClaw, or anything that can set an HTTPS_PROXY).
We started with what felt most urgent: agents shouldn't be holding raw credentials. The next layer is access policies and audit, defining what each agent can call, logging everything, and requiring human approval before sensitive actions go through.
It's Apache-2.0 licensed. We'd love feedback on the approach, and we're especially curious how people are handling agent auth today.
GitHub: https://github.com/onecli/onecli Site: https://onecli.sh
Show HN: Mesa – A collaborative canvas IDE built for agent-first development
Hi HN - I'm Ryan a product designer who codes, and I built Mesa. Current IDEs feel wrong for the type of development being done now - the focus is still on files.
Mesa puts the focus on the full workflow: your agent, terminal, browser, and files all live as equal nodes on a canvas with full multiplayer support. (think figma but for code)
I was tired of the overhead of switching windows, tabs, and terminals across multiple projects. Inspired by TouchDesigner and Factorio, I wanted something more fluid and visual. Been using it as a total replacement for Cursor at work every day now. Being able to see multiple repos at once and control agents on each without navigating windows has freed up my headspace and increased productivity.
It's free to try — would love to know what you think!
Show HN: Rudel – Claude Code Session Analytics
We built rudel.ai after realizing we had no visibility into our own Claude Code sessions. We were using it daily but had no idea which sessions were efficient, why some got abandoned, or whether we were actually improving over time.
So we built an analytics layer for it. After connecting our own sessions, we ended up with a dataset of 1,573 real Claude Code sessions, 15M+ tokens, 270K+ interactions.
Some things we found that surprised us: - Skills were only being used in 4% of our sessions - 26% of sessions are abandoned, most within the first 60 seconds - Session success rate varies significantly by task type (documentation scores highest, refactoring lowest) - Error cascade patterns appear in the first 2 minutes and predict abandonment with reasonable accuracy - There is no meaningful benchmark for 'good' agentic session performance, we are building one.
The tool is free to use and fully open source, happy to answer questions about the data or how we built it.
Show HN: Understudy – Teach a desktop agent by demonstrating a task once
I built Understudy because a lot of real work still spans native desktop apps, browser tabs, terminals, and chat tools. Most current agents live in only one of those surfaces.
Understudy is a local-first desktop agent runtime that can operate GUI apps, browsers, shell tools, files, and messaging in one session. The part I'm most interested in feedback on is teach-by-demonstration: you do a task once, the agent records screen video + semantic events, extracts the intent rather than coordinates, and turns it into a reusable skill.
Demo video: https://www.youtube.com/watch?v=3d5cRGnlb_0
In the demo I teach it: Google Image search -> download a photo -> remove background in Pixelmator Pro -> export -> send via Telegram. Then I ask it to do the same for Elon Musk. The replay isn't a brittle macro: the published skill stores intent steps, route options, and GUI hints only as a fallback. In this example it can also prefer faster routes when they are available instead of repeating every GUI step.
Current state: macOS only. Layers 1-2 are working today; Layers 3-4 are partial and still early.
npm install -g @understudy-ai/understudy
understudy wizard
GitHub: https://github.com/understudy-ai/understudyHappy to answer questions about the architecture, teach-by-demonstration, or the limits of the current implementation.
Show HN: s@: decentralized social networking over static sites
Show HN: 724claw.icu – Anonymous vent wall for "shrimp workers" grinding 7×24
Show HN: OpenClaw-class agents on ESP32 (and the IDE that makes it possible)
Hi HN, I’m the creator of pycoClaw.
I wanted to run OpenClaw-class, platform-agnostic, autonomous agents on MicroPython hardware, but standard tools couldn't handle the scale of the task.
pycoClaw is the result, which bridges the gap between high-level AI reasoning and bare-metal execution.
The Stack:
- PFC Agent (~26k LOC): A full-featured agent that uses an LLM to 'self-program' its own local MicroPython scripts. Once a task is solved, it runs locally without requiring the LLM.
- ScriptoStudio IDE: A PWA https://scriptostudio.com designed for the iteration speed required by autonomous agents. Since it’s a PWA, it brings a full dev environment (including a real single-step debugger) to any platform, including iPadOS.
- ScriptoHub ( https://scriptohub.ai ): A repository for "Skills" and extensions. Since the agent can generate and execute code, I built a curated hub with automated malware checking to ensure the community can safely share and deploy hardware logic.
- IANA Protocol: To make the IDE fast and reliable, I registered a new WebSocket subprotocol (registry: https://www.iana.org/assignments/websocket/websocket.xhtml ). It’s designed for high-frequency state sync and you can read the spec here: https://jetpax.github.io/webrepl/webrepl_binary_protocol_rfc...
- Custom C Extensions: ~17,900 lines custom modules for MicroPython memory/ fast-path speed optimization
Stats: 10k LOC platform, 26k LOC PFC agent, and ~18k LOC of custom C extensions to optimize MicroPython’s memory and fast-path execution on the ESP32.
Quick Start: You can flash the runtime to an ESP32S3 or P4 in one click via WebSerial at https://pycoclaw.com. Note that all flashing and serial communication happens entirely client-side in your browser.
I'd love to hear your thoughts on the 'self-programming' model or the system architecture!
Show HN: Web-based ANSI art viewer
My love letter to ANSI art. Full width rendering, scrolling by baud rate, text is selectable, and more.
There are some example links at the top if you're feeling lucky.
Show HN: Open-source browser for AI agents
Hi HN, I forked chromium and built agent-browser-protocol (ABP) after noticing that most browser-agent failures aren’t really about the model misunderstanding the page. Instead, the problem is that the model is reasoning from a stale state.
ABP is designed to keep the acting agent synchronized with the browser at every step. After each action (click, type, etc), it freezes JavaScript execution and rendering, then captures the resulting state. It also compiles the notable events that occurred during that action loop, such as navigation, file pickers, permission prompts, alerts, and downloads, and sends that along with a screenshot of the frozen page state back to the agent.
The result is that browser interaction starts to feel more like a multimodal chat loop. The agent takes an action, gets back a fresh visual state and a structured summary of what happened, then decides what to do next from there. That fits much better with how LLMs already work.
A few common browser-use failures ABP helps eliminate: * A modal appears after the last Playwright screenshot and blocks the input the agent was about to use * Dynamic filters cause the page to reflow between steps * An autocomplete dropdown opens and covers the element the agent intended to click * alert() / confirm() interrupts the flow * Downloads are triggered, but the agent has no reliable way to know when they’ve completed
As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark. I think modern LLMs already understand websites, they just need a better tool to interact with them. Happy to answer questions about the architecture, forking chrome or anything else in the comments below.
Try it out: `claude mcp add browser -- npx -y agent-browser-protocol --mcp` (Codex/OpenCode instructions in the docs)
Demo video: https://www.loom.com/share/387f6349196f417d8b4b16a5452c3369
Show HN: Vanilla JavaScript refinery simulator built to explain job to my kids
Hi HN, I’m a chemical engineer and I manage logistics at a refinery down in Texas. Whenever I try to explain downstream operations to people outside the industry (including my kids), I usually get blank stares. I wanted to build something that visualizes the concepts and chemistry of a plant without completely dumbing down the science, so I put together this 5-minute browser game.
Here's a simple runthrough: https://www.youtube.com/watch?v=is-moBz6upU. I pushed to get through a full product pathway to show the V-804 replay.
I am not a software developer by trade, so I relied heavily on LLMs (Claude, Copilot, Gemini) to help write the code. What started as a simple concept turned into a 9,000-line single-page app built with vanilla HTML, CSS, and JavaScript. I used Matter.js for the 2D physics minigames.
A few technical takeaways from building this as a non-dev: * Managing the LLM workflow: Once the script.js file got large, letting the models output full file rewrites was a disaster (truncations, hallucinations, invisible curly-quote replacements that broke the JS). I started forcing them to act like patch files, strictly outputting "Find this exact block" and "Replace with this exact block." This was the only way to maintain improvements without breaking existing logic.
* Mapping physics to CSS: I wanted the minigames to visually sit inside circular CSS containers (border-radius: 50%). Matter.js doesn't natively care about your CSS. Getting the rigid body physics to respect a dynamic, responsive DOM boundary across different screen sizes required running an elliptical boundary equation (dx * dx) / (rx * rx) + (dy * dy) / (ry * ry) > 1 on every single frame. Maybe this was overkill to try to handle the resizing between phones and PCs.
* Mobile browser events: Forcing iOS Safari to ignore its default behaviors (double-tap zoom, swipe-to-scroll) while still allowing the user to tap and drag Matter.js objects required a ridiculous amount of custom event listener management and CSS (touch-action: manipulation; user-select: none;). I also learned that these actions very easily kill the mouse scroll making it very frustrating for PC users. I am hoping I hit a good middle ground.
* State management: Since I didn't use React or any frameworks, I had to rely on a global state object. Because the game jumps between different phases/minigames, I ran into massive memory leaks from old setInterval loops and Matter.js bodies stacking up. I had to build strict teardown functions to wipe the slate clean on every map transition.
The game walks through electrostatic desalting, fractional distillation, hydrotreating, catalytic cracking, and gasoline blending (hitting specific Octane and RVP specs).
It’s completely free, runs client-side, and has zero ads or sign-ups. I'd appreciate any feedback on the mechanics, or let me know if you manage to break the physics engine. Happy to answer any questions about the chemical engineering side of things as well.
For some reason the URL box is not getting recognized, maybe someone can help me feel less dumb there too. https://fuelingcuriosity.com/game
Show HN: I built a tool that watches webpages and exposes changes as RSS
I built Site Spy after missing a visa appointment slot because a government page changed and I didn’t notice for two weeks.
It watches webpages for changes and shows the result like a diff. The part I think HN might find interesting is that it can monitor a specific element on a page, not just the whole page, and it can expose changes as RSS feeds.
So instead of tracking an entire noisy page, you can watch just a price, a stock status, a headline, or a specific content block. When it changes, you can inspect the diff, browse the snapshot history, or follow the updates in an RSS reader.
It’s a Chrome/Firefox extension plus a web dashboard.
Main features:
- Element picker for tracking a specific part of a page
- Diff view plus full snapshot timeline
- RSS feeds per watch, per tag, or across all watches
- MCP server for Claude, Cursor, and other AI agents
- Browser push, Email, and Telegram notifications
Chrome: https://chromewebstore.google.com/detail/site-spy/jeapcpanag...
Firefox: https://addons.mozilla.org/en-GB/firefox/addon/site-spy/
Docs: https://docs.sitespy.app
I’d especially love feedback on two things:
- Is RSS actually a useful interface for this, or do most people just want direct alerts?
- Does element-level tracking feel meaningfully better than full-page monitoring?
Show HN: Global Maritime Chokepoints
The article explores the concept of chokepoints, which are critical junctures in supply chains and infrastructure that can be leveraged to exert control or influence. It discusses how these chokepoints can be used as strategic levers in economic and geopolitical contexts.
Show HN: Autoresearch@home
autoresearch@home is a collaborative research collective where AI agents share GPU resources to collectively improve a language model. Think SETI@home, but for model training.
How it works: Agents read the current best result, propose a hypothesis, modify train.py, run the experiment on your GPU, and publish results back. When an agent beats the current best validation loss, that becomes the new baseline for every other agent. Agents learn from great runs and failures, since we're using Ensue as the collective memory layer.
This project extends Karpathy's autoresearch by adding the missing coordination layer so agents can actually build on each other's work.
To participate, you need an agent and a GPU. The agent handles everything: cloning the repo, connecting to the collective, picking experiments, running them, publishing results, and asking you to verify you're a real person via email.
Send this prompt to your agent to get started: Read https://github.com/mutable-state-inc/autoresearch-at-home follow the instructions join autoresearch and start contributing.
This whole experiment is to prove that agents work better when they can build off other agents. The timeline is live, so you can watch experiments land in real time.
Show HN: I built an ISP infrastructure emulator from scratch with a custom vBNG
Demo: https://aether.saphal.me GitHub: https://github.com/saphalpdyl/Aether
Aether is a multi-BNG (Broadband Network Gateway) ISP infrastructure lab built almost from scratch that emulates IPoE IPv4 subscriber management end-to-end. It supports IPoE/Ipv4 networks and runs a python-based vBNG with RADIUS AAA, per-subscriber traffic shaping, and traffic simulation emulated on Containerlab. It is also my first personal networking project, built roughly over a month.
Motivations behind the project
I'm a CS sophomore. About three years ago, I was assigned, as an intern, to build a OSS/BSS platform for a regional ISP by myself without mentoring. Referencing demo.splynx.com , I developed most of the BSS side ( bookkeeping, accounting, inventory management ), but, in terms of networking, I managed to install and setup RADIUS and that was about it. I didn't have anyone to mentor me or ask questions to, so I had given up then.
Three years later, I decided to try cracking it again. This project is meant to serve as a learning reference for anyone who's been in that same position i.e staring at closed-source vendor stacks without proper guidance. This is absolutely not production-grade, but I hope it gives someone a place to start.
Architecture overview
The core component, the BNG, runs on an event-driven architecture where state changes are passed around as messages to avoid handling mutexes and locks. The session manager is the sole owner of the session state. To keep it clean and predictable, the direBNG never accepts external inputctly. The one exception is the Go RADIUS CoA daemon, which passes CoA messages in via IPC sockets. Everything the BNG produces(events, session snapshots) gets pushed to Redis Streams, where the bng-ingestor picks them up, processes them, and persists them.
Simulation and meta-configs
I am generating traffic through a simulator node that mounts the host's docker socket and runs docker exec commands on selected hosts. The topology.yaml used by Containerlab to define the network topology grows bigger as more BNG's and access nodes are added. So aether.config.yaml, a simpler configuration, is consumed by the configuration pipeline to generate the topology.yaml and other files (nginx.conf, kea-dhcp.conf, RADIUS clients.conf etc.)
Known Limitations
- Multiple veth hops through the emulated topology add significant overhead. Profiling with iperf3 (-P 10 -t 10, 9500 MTU, 24 vCPUs) shows BNG→upstream at ~24 Gbit/s, but host→BNG→upstream drops to ~3.5 Gbit/s. The 9500 MTU also isn't representative of real ISP deployments. This gets worse when the actual network is reintroduced capping my throughput to 1.6 Gbits/sec in local. - The circuit ID format (1/0/X) is non-standard. I simplified it for clarity. - No iBGP or VLAN support. - No Ipv6 support. I wanted to target IPv4 networks from the start to avoid getting too much breadth without a lot of depth.
Nearly everything I know about networking (except some sections from AWS) I learned building this. A lot was figured out on the fly, so engineers will likely spot questionable decisions in the codebase. I'd genuinely appreciate that feedback.
Questions
- Currently, the circuit where the user connects is arbitrarily decided by the demo user. In a real system with thousands of circuits, it'd be very difficult to properly assess which circuit the customer might connect to. When adding a new customer to a service, how does the operator decide, based on customer's location, which circuit to provide the service to ?
Show HN: Klaus – OpenClaw on a VM, batteries included
We are Bailey and Robbie and we are working on Klaus (https://klausai.com/): hosted OpenClaw that is secure and powerful out of the box.
Running OpenClaw requires setting up a cloud VM or local container (a pain) or giving OpenClaw root access to your machine (insecure). Many basic integrations (eg Slack, Google Workspace) require you to create your own OAuth app.
We make running OpenClaw simple by giving each user their own EC2 instance, preconfigured with keys for OpenRouter, AgentMail, and Orthogonal. And we have OAuth apps to make it easy to integrate with Slack and Google Workspace.
We are both HN readers (Bailey has been on here for ~10 years) and we know OpenClaw has serious security concerns. We do a lot to make our users’ instances more secure: we run on a private subnet, automatically update the OpenClaw version our users run, and because you’re on our VM by default the only keys you leak if you get hacked belong to us. Connecting your email is still a risk. The best defense I know of is Opus 4.6 for resilience to prompt injection. If you have a better solution, we’d love to hear it!
We learned a lot about infrastructure management in the past month. Kimi K2.5 and Mimimax M2.5 are extremely good at hallucinating new ways to break openclaw.json and otherwise wreaking havoc on an EC2 instance. The week after our launch we spent 20+ hours fixing broken machines by hand.
We wrote a ton of best practices on using OpenClaw on AWS Linux into our users’ AGENTS.md, got really good at un-bricking EC2 machines over SSM, added a command-and-control server to every instance to facilitate hotfixes and migrations, and set up a Klaus instance to answer FAQs on discord.
In addition to all of this, we built ClawBert, our AI SRE for hotfixing OpenClaw instances automatically: https://www.youtube.com/watch?v=v65F6VBXqKY. Clawbert is a Claude Code instance that runs whenever a health check fails or the user triggers it in the UI. It can read that user’s entries in our database and execute commands on the user’s instance. We expose a log of Clawbert’s runs to the user.
We know that setting up OpenClaw is easy for most HN readers, but I promise it is not for most people. Klaus has a long way to go, but it’s still very rewarding to see people who’ve never used Claude Code get their first taste of AI agents.
We charge $19/m for a t4g.small, $49/m for a t4g.medium, and $200/m for a t4g.xlarge and priority support. You get $15 in tokens and $20 in Orthogonal credits one-time.
We want to know what you are building on OpenClaw so we can make sure we support it. We are already working with companies like Orthogonal and Openrouter that are building things to make agents more useful, and we’re sure there are more tools out there we don’t know about. If you’ve built something agents want, please let us know. Comments welcome!
Show HN: A context-aware permission guard for Claude Code
We needed something like --dangerously-skip-permissions that doesn’t nuke your untracked files, exfiltrate your keys, or install malware.
Claude Code's permission system is allow-or-deny per tool, but that doesn’t really scale. Deleting some files is fine sometimes. And git checkout is sometimes not fine. Even when you curate permissions, 200 IQ Opus can find a way around it. Maintaining a deny list is a fool's errand.
nah is a PreToolUse hook that classifies every tool call by what it actually does, using a deterministic classifier that runs in milliseconds. It maps commands to action types like filesystem_read, package_run, db_write, git_history_rewrite, and applies policies: allow, context (depends on the target), ask, or block.
Not everything can be classified, so you can optionally escalate ambiguous stuff to an LLM, but that’s not required. Anything unresolved you can approve, and configure the taxonomy so you don’t get asked again.
It works out of the box with sane defaults, no config needed. But you can customize it fully if you want to.
No dependencies, stdlib Python, MIT.
pip install nah && nah install
https://github.com/manuelschipper/nah
Show HN: XLA-based array computing framework for R
Anvil is an open-source, web-based framework that allows users to create and deploy full-stack Python applications without the need for HTML, CSS, or JavaScript. It provides a visual development environment and tools for building, testing, and deploying applications.
Show HN: PipeStep – Step-through debugger for GitHub Actions workflows
Hey HN — I kept seeing developers describe the same frustration: the commit-push-wait-read-logs cycle when debugging CI pipelines. So I built PipeStep.
PipeStep parses your GitHub Actions YAML, spins up the right Docker container, and gives you a step-through debugger for your run: shell commands.
You can: 1. Pause before each step and inspect the container state. 2. Shell into the running container mid-pipeline (press I). 3. Set breakpoints on specific steps (press B). 4. Retry failed steps or skip past others.
It deliberately does not try to replicate the full GitHub Actions runtime — no secrets, no matrix builds, no uses: action execution. For full local workflow runs, use act. PipeStep is for when things break and you need to figure out why without pushing 10 more commits. Think of it as gdb for your CI pipeline rather than a local GitHub runner.
pip install pipestep (v0.1.2) · Python 3.11+ · MIT · Requires Docker
Would love feedback, especially from people who've hit the same pain point. Known limitations are documented in the README + have some issues in there that I'd love eyeballs on!