Story

Show HN: Unified multimodal memory framework, without embeddings

k_kiki Wednesday, January 07, 2026

Hi HN,

We’ve been building memU(https://github.com/NevaMind-AI/memU), an open-source, general-purpose memory framework for AI agents. It supports dual-mode retrieval: classic RAG and LLM-based direct file reading.

Most multimodal memory systems either embed everything into vectors or treat non-text data as attachments. These work, but at scale it becomes hard to explain why certain context was retrieved and what evidence it relies on.

memU takes a different approach: since models reason in language, multimodal memory should converge into structured, queryable text, while remaining fully traceable to original data.

---

## Three-Layer Architecture

- Resource Layer Stores raw multimodal data as ground truth. All higher-level memory remains traceable to this layer.

- Memory Item Layer Extracts atomic facts from raw data and stores them as natural-language statements. Embeddings are optional and used only for acceleration.

- Memory Category Layer Aggregates items into readable, theme-based memory files (e.g. user preferences, work logs). Frequently accessed topics stay active; low-usage content is demoted to balance speed and coverage.

---

## Memorization Bottom-up and asynchronous. Data flows from resources → items → category files without manual schemas. When capacity is reached, recently relevant memories replace the least used ones.

## Retrieval Top-down. memU searches category files first, then items, and only falls back to raw data if needed. At the item layer, it combines BM25 + embeddings to balance exact matching and semantic recall, avoiding embedding-only imprecision.

Dual-mode retrieval lets applications choose between: - low-latency embedding search, or - LLM-based direct reading of memory files.

## Evolution Memory structure adapts automatically based on real usage: - Frequently accessed memories remain at the Category layer - Memories retrieved from raw data are promoted upward and linked - Organization evolves from usage patterns, not predefined rules

Goal: keep relevant memories retrievable at the Category layer and minimize latency over time.

---

## A Unified Multimodal Memory Pipeline memU is a text-centered multimodal memory system. Multimodal inputs are progressively converted into interpretable text memory, while staying traceable to original data. This provides stable, high-level context for reasoning, with detailed evidence available when needed—inside a memory structure that evolves through real-world use.

Summary
The article discusses the development of memU, an open-source chatbot platform that aims to provide a scalable and secure solution for building conversational AI applications. It highlights the platform's key features, such as modular architecture, support for multiple languages, and advanced natural language processing capabilities.
7 2
Summary
github.com
Visit article Read on Hacker News Comments 2