Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG
Tananon Sunday, October 19, 2025Hey HN! I’ve recently open-sourced Pyversity, a lightweight library for diversifying retrieval results. Most retrieval systems optimize only for relevance, which can lead to top-k results that look almost identical. Pyversity efficiently re-ranks results to balance relevance and diversity, surfacing items that remain relevant but are less redundant. This helps with improving retrieval, recommendation, and RAG pipelines without adding latency or complexity.
Main features:
- Unified API: one function (diversify) supporting several well-known strategies: MMR, MSD, DPP, and COVER (with more to come)
- Lightweight: the only dependency is NumPy, keeping the package small and easy to install
- Fast: efficient implementations for all supported strategies; diversify results in milliseconds
Re-ranking with cross-encoders is very popular right now, but also very expensive. From my experience, you can usually improve retrieval results with simpler and faster methods, such as the ones implemented in this package. This helps retrieval, recommendation, and RAG systems present richer, more informative results by ensuring each new item adds new information.
Code and docs: github.com/pringled/pyversity
Let me know if you have any feedback, or suggestions for other diversification strategies to support!