Ask HN: How are people safely reusing LLM answers in production RAG systems?
acfscience Friday, January 30, 2026I’m curious how teams are handling answer reuse in production RAG systems.
We’ve looked at: • naive semantic caching • query similarity thresholds • embedding-based reuse
…but correctness risk seems to make most of these approaches scary in practice, especially when: • source docs change • retrieval context shifts • similar queries require different answers
Are teams: • avoiding answer reuse entirely? • limiting reuse to very narrow FAQ-style flows? • using some form of conservative gating or shadow evaluation?
Not looking for vendor recommendations — just trying to understand what’s actually working (or failing) in real systems.
Thanks!
1
0