Story

Ask HN: How are people safely reusing LLM answers in production RAG systems?

acfscience Friday, January 30, 2026

I’m curious how teams are handling answer reuse in production RAG systems.

We’ve looked at: • naive semantic caching • query similarity thresholds • embedding-based reuse

…but correctness risk seems to make most of these approaches scary in practice, especially when: • source docs change • retrieval context shifts • similar queries require different answers

Are teams: • avoiding answer reuse entirely? • limiting reuse to very narrow FAQ-style flows? • using some form of conservative gating or shadow evaluation?

Not looking for vendor recommendations — just trying to understand what’s actually working (or failing) in real systems.

Thanks!

1 0
Read on Hacker News