Security breaks during partial failures – design notes from distributed systems
TL;DR: Many security mechanisms fail not during attacks, but during partial outages. This post documents early design notes for a failure-aware security framework for distributed systems.
The problem
In production distributed systems, security often breaks when things are half working:
auth services degrade → retries explode
fallback paths widen access
recovery logic becomes the attack surface
Nothing is “exploited”, yet the system becomes unsafe.
Most security models assume stable components and clean failures. Real systems don’t behave that way.
Design assumptions
We assume:
correlated failures
retries are adversarial
timeouts are unsafe defaults
recovery paths matter as much as steady-state logic
We don’t assume:
global consistency
perfect identity
reliable clocks
centralized enforcement
Framework ideas (high level)
This work explores four ideas:
1. Failure-aware trust
Trust degrades under failure, not just compromise
Access narrows automatically during partial outages
2. Security invariants at runtime
Invariants are continuously enforced
Violations trigger containment, not alerts
3. Retry-safe security primitives
Idempotent, monotonic, side-effect bounded
Retries can’t escalate privilege
4. Security as observable state
Trust level, degradation, and containment are visible
If you can’t observe it, you can’t secure it
What this is not
Not zero trust marketing
Not compliance
Not a finished system
It’s an attempt to treat failure as the normal case, not an exception.
Why publish this early?
Because many real failures:
don’t fit clean research papers
happen during incidents, not attacks
are invisible outside production systems
We’re sharing design notes to get feedback before formalizing or evaluating further.
Feedback welcome
If you’ve seen security regressions during outages or retries causing unsafe behavior, I’d like to hear about it.
This is ongoing work. No claims of novelty or completeness.
Ask HN: When do we expose "Humans as Tools" so LLM agents can call us on demand?
Serious question.
We're building agentic LLM systems that can plan, reason, and call tools via MCP. Today those tools are APIs. But many real-world tasks still require humans.
So… why not expose humans as tools?
Imagine TaskRabbit or Fiverr running MCP servers where an LLM agent can:
- Call a human for judgment, creativity, or physical actions
- Pass structured inputs
- Receive structured outputs back into its loop
At that point, humans become just another dependency in an agent's toolchain. Though slower, more expensive, but occasionally necessary.
Yes, this sounds dystopian. Yes, it treats humans as "servants for AI." Thats kind of the point. It already happens manually... this just formalizes the interface.
Questions I'm genuinely curious about:
- Is this inevitable once agents become default software actors? (As of basically now?)
- What breaks first: economics, safety, human dignity or regulation?
- Would marketplaces ever embrace being "human execution layers" for AI?
Not sure if this is the future or a cursed idea we should actively prevent... but it feels uncomfortably plausible.
Tell HN: Happy New Year
Ask HN: Why is Apple's voice transcription hilariously bad?
Why is Apple’s voice transcription so hilariously bad?
Even 2–3 years ago, OpenAI’s Whisper models delivered better, near-instant voice transcription offline — and the model was only about ~500 MB. With that context, it’s hard to understand how Apple’s transcription, which runs online on powerful servers, performs so poorly today.
Here are real examples from using the iOS native app just now:
- “BigQuery update” → “bakery update”
- “GitHub” → “get her”
- “CI build” → “CI bill”
- “GitHub support” → “get her support”
These aren’t obscure terms — they’re extremely common words in software, spoken clearly in casual contexts. The accuracy gap feels especially stark compared to what was already possible years ago, even fully offline.
Is this primarily a model-quality issue, a streaming/segmentation problem, aggressive post-processing, or something architectural in Apple’s speech stack? What are the real technical limitations, and why hasn’t it improved despite modern hardware and cloud processing?
Ask HN: How did you learn to code?
Ask HN: How Are You Handling Auth in 2026?
Supabase used to be my go-to but wondering if there are any easier out of the box solutions I haven't looked into. I'm investigating Clerk and have asked LLMs but curious to get the real take on what's working and what's easy from devs that actually have skin in the game.
I built a public skill registry and MCP server so Codex can install new skills
Hi HN,
I’ve been working on a simple idea: instead of hard-coding capabilities into Codex-like agents, let them install skills on demand, from a public registry.
The setup is intentionally minimal:
A public skill registry (JSON index + signed artifacts)
A CLI (npx codex-skill install <skill>) for humans
A Model Context Protocol (MCP) server so agents can:
search skills
fetch manifests
verify artifacts
install workflows programmatically
As a concrete example, the first skill is a theming skill for frontend projects (static HTML or Next.js + shadcn/ui). An agent can install the skill, apply a theme, and produce a clean diff in under a minute.
What this enables:
Agents that evolve without redeploying the core model
A neutral, inspectable “app store” for agent skills
Deterministic workflows (install → apply → diff → verify)
Humans and agents using the same install path
This is early and intentionally boring technically. The goal is to see if a shared skill ecosystem for agents actually makes sense in practice.
Happy to hear thoughts, criticism, or similar experiments people have tried.
Repo / demo links in comments if relevant.
Ask HN: What did you read in 2025?
I mostly read newspapers and technical journals, but two books that I read that made an impression: "The Changing World Order" and "The Gulag Archipelago".
Ask HN: Loneliness at 19, how to cope?
I am a college student and for my entire life I have been lonely. This is probably taken a very heavy toll on my mental health but that’s another story. I’ve never been able to make friends and keep meaningful connections that last a long time. In fact I’d go as far as saying I have never had a friend, and I currently don’t have any. My phone is empty, when I go to school nobody talks to me and when I do find people who seem to have some kind of interest in me, it usually doesn’t last very long since they don’t prioritize whatever we have. As far as I’m aware I am tolerable to be around. People find me funny and when I do talk to people we have decent conversations (though small talk tends to bore me). However that doesn’t lead anywhere and doesn’t bring me any kind of comfort or fulfillment. I’ve attributed my lack of friends to something that places all the blame on me. Maybe I’m ugly, maybe I’m not funny enough, maybe I’m dumb. I don’t know if that’s the right approach. But I’ve tried so many different things, I’ve read so many different books and yet I still can’t get anyone to even bother to ask me how my day was or care to actually do something and hang out with me when I ask if they’d like too.
What am I supposed to do? Be lonely and without any kind of company and human connection my entire life?
Ask HN: What is the best microVMs for AI agents?
Three weeks ago, we just launched an open-source computer-use agent: https://github.com/zfoong/WhiteCollarAgent
However, we are currently looking for self-hosted and easy-to-set-up microVM solutions for the agent's GUI mode. The idea is to let agents operate in an isolated environment for its GUI operation, like web-browsing, launching an app, and using the app, etc.
Anyone with any experience with microVM, feel free to let me know in the comments. Many thanks!
Semantica – Open-source semantic layer and GraphRAG framework
Hi HN,
I’m sharing Semantica, an MIT-licensed open-source framework for building semantic layers and knowledge engineering systems for AI.
Many RAG and agent systems fail not due to model quality, but due to the semantic gap — unstructured, inconsistent data without explicit entities, rules, or relationships. Vector-only approaches often hallucinate or fail silently under real-world data.
Semantica focuses on transforming messy data into reasoning-ready semantic knowledge.
Core capabilities: - Universal ingestion (PDF, DOCX, HTML, JSON, CSV, databases, APIs) - Automated entity and relationship extraction - Knowledge graph construction with entity resolution - Automated ontology generation and validation - GraphRAG (hybrid vector + graph retrieval, multi-hop reasoning) - Persistent semantic memory for AI agents - Conflict detection, deduplication, and provenance tracking
Project links: Docs: https://hawksight-ai.github.io/semantica/ GitHub: https://github.com/Hawksight-AI/semantica
I’d appreciate feedback from people working on knowledge graphs, GraphRAG, agent memory, or production RAG reliability.
Happy to discuss design trade-offs or answer technical questions.
Ask HN: Any example of successful vibe-coded product?
Many people talk about vibe-coding and about the different ways to use this development "methodology" successfully. I wonder though if anyone really managed to push to production anything that has been fully or almost fully created through LLM assisted coding. Do you have anything to share, whether you or someone else created it? Possibly something more complex than a static webpage.
Ask HN: Does reading HN make you happy?
Times change, and as they change so do communities you interact with. I used to like coming to HN because the discussions were often far away from the stresses of the world (politics, local news tragedies, etc.)
Lately though its article after article on LLMs. Pro LLM or Anti LLM. These discussions come closer to the stresses of the world than the typical HN post did historically. Well, at least to me.
They often quickly become “AI is bad” or “AI is amazing” in the discussions. I want to mention, I’m not pro or against either way. There doesn’t need to be sides to pick.
Do these posts that dominate the top make you happy? For me it’s turned HN into a place that stresses.
PS. I’m not asking for coping mechanisms, I’ve already cut my time here reading down a bunch :)
Tell HN: Stripe Dashboard Is Slow
Anyone else notice in recent years how SLOW the stripe dashboard has become? It used to be a simple clean interface that was fast
Tell HN: Happy New Year!
In a world increasingly filled with noise, I’m grateful for this community’s relentless pursuit of the "interesting." Here’s to a 2026 filled with deep dives, side projects that actually launch, and high-signal threads.
Stay curious and keep building.
Ask HN: How to do a Personal Cybersecurity audit
I am acutely aware that if I were targeted by a non sophisticated actor (like a very motivated hacker, or a phone/laptop thief with programming knowledge), I would be toast if they figured out, e.g my windows password, as that is the key to my Chrome keychain, for e.g, which allows them into a pandora's box of accounts.
Even more likely, if I were to get a laptop stolen while unlocked, they could get access to my primary email(s), which could lead them to getting access to accounts via password reset. There were a lot of similar other failure points I used to keep enumerated mentally, but now there's too many to count. The biggest ones are email access however.
Is there a process or method I can use to enumerate/track and fix those kids of failure points in my personal cybersecurity?
Ask HN: How long before the first civilian cargo flights are AI piloted?
Is it 2026? Within 2 years? 5 years? 10 years?
I can understand how passenger flights will take a while longer - but would cargo flights that don't have nearly the safety concerns would be AI piloted much sooner? If so, how much sooner?
Happy New Year HN!
Wishing everyone here a wonderful 2026.
But more than that, I would like to wish this forum, this space, this virtual reality on the internet, another wonderful year. For broadening horizons, for piquing curiosity, for reminding all of us to celebrate and rejoice at the wonder and awe in the world around us.
Happy New Year!
Ask HN: How did you make yourself more marketable?
I'm a full stack engineer. Pretty smart one, but I don't think that's enough to distinguish myself in this market. I'm wondering if anyone here has done things in particular to make yourself more marketable.
A curated directory of open-source AI projects
Hi HN,
I’m building OSSAIX (https://ossaix.com ) — a curated directory for open-source AI projects.
The goal is to make it easier for developers to discover and evaluate OSS AI tools (LLMs, RAG/agents, local AI, image/audio/video, etc.) without digging through endless GitHub repos.
Each project is reviewed and includes basic signals like categories and GitHub activity to help assess usefulness quickly.
This is still early, and I’d love feedback from the HN community:
Is a curated OSS AI directory useful?
What signals/features would help you decide faster?
What’s missing or unnecessary?
Thanks in advance for any thoughts
Ask HN: How to go back to listening to MP3s?
I have been a paying Spotify customer for many years now. Thanks to the yearly wrapped event, I am reminded how my use pattern is listening to a limited amount of tracks on repeat.
I'm curious if any of you has made the switch back to listening to mp3s? If you did, which apps are you using?
TP-Link only works with a permanent internet connection
TP-Link Tapo C100 model only works with a permanent connection to their servers. The second you cut internet on the camera it powers off. Why would anyone put a "security" camera in their house with a forced connection to TP-Link servers? And what does it transmit? So my question is does anyone have a TP-Link camera that works without internet? TP-Link has a ton of cameras but I don't want any that connects to the internet. Thanks.
Ask HN: How are you sandboxing coding agents?
I've seen people rely on built-in sandboxes, use git worktrees (sometimes inside devcontainers), or run the whole agent inside a Linux VM with minimal host mounts. On Linux, I’ve also seen firejail/bubblewrap mentioned.
For folks actually using these tools day-to-day:
What’s your default setup?
Have you had any "learned the hard way" moments?
What tradeoff (safety vs convenience vs parallelism) has mattered most in practice?
I'm less interested in theoretical best practices than what's actually holding up under real use.
Tell HN: I am afraid AI will take my job at some point
I have been doing software for a living for the past 10 years or so.
I can call myself an average senior engineer. Cannot really pass the DSA rounds at Tier 1/Tier 2.
Somehow was able to keep the jobs I had so far via pure bruteforce and hard work.
These days I am pair programming with AI to write a lot of code. Probably checking in about 10 to 15k lines of code per month on average. I know it may not be a good metric, but if I compare myself to an earlier verision of me, that person would be checking in a 2 or 3 k lines of code at best per month.
I can get the work done, probably can do a bit of good judgement when AI writes sloppy code.
But, I am not sure till when these skills will be relevant
Like what if that judgement is not needed anymore, like 2-3 years down the line?
Is anyone else in the same boat? How are you dealing with this?
Ask HN: How do you manage kids' accounts?
My kids are just getting to the point where I need to manage several internet accounts (iCloud, Google, Amazon Kids) and parental controls settings across several devices (iPad, Alexa, Apple Watch).
It’s getting a bit confusing between passwords, content settings, notifications, payments, PINs etc.
What system do you use to keep this manageable in your household?
Ask HN: How do you get visibility if you're suuuuper bad at marketing?
Hi, I built a small tool that I have used daily for a long time. A few friends and classmates also use it and they keep telling me it is genuinely useful. But I am stuck on distribution. I am a student, I have no budget for ads, and I am not good at marketing (i try but i'm super bad). When I mention it in other communities it often gets treated as self promotion and I get blocked.
If you were starting from zero today, how would you get the first 100 real users in a clean way? I would love specific ideas like where to share, what kind of write up works, how to approach niche communities, or what you would build into the product to make sharing natural.
Thanks.
Ask HN: What do you use to manage your coding projects?
I feel like I change what "tool" I use to manage/juggle my projects on a monthly basis these days.
That's likely a me problem; getting bored with the tool itself, but I often find myself reverting back to a pen and notepad, paper, notecard, etc.
This usually happens after using an app/software that is needlessly complex and ends up requiring me to manage it rather than it providing any organizational or "productivity" value. (A lot easier to write a task at the top of a notecard rather than assigning 27 "priority" tags, deadlines, location, categories, etc. to the thing)
I know everyone is different in this realm, but very interested in what's been working for you.
Users decide which online platforms to trust in 2025
I’ve been thinking about how trust in online platforms is formed today. Beyond marketing claims, what signals really matter to users in 2025?
Curious to hear perspectives from others here.
Ask HN: What was the hardest bug you tracked down in 2025?
We talk a lot about shipping features, but I want to hear the war stories.
I spent almost a month chasing a silent data corruption issue that turned out to be floating-point non-determinism between x86 and ARM chips. It completely changed how I look at "reliable" memory.
What was your "white whale" bug of the year?
Tell HN: No Scrollbar on Google Gemini UI
There is no visible scrollbar for https://gemini.google.com on Chrome and Edge -- though I see it on Firefox. (desktop)
I also checked the smartphone apps -- Gemini, Claude, ChatGPT, Perplexity, NotebookLM -- none of them show a scrollbar (even as an obligatory visual indicator of yor position in the scroll)
Are we regressing in acessible design and standards?