Show HN: Unified multimodal memory framework, without embeddings
k_kiki Wednesday, January 07, 2026Hi HN,
We’ve been building memU(https://github.com/NevaMind-AI/memU), an open-source, general-purpose memory framework for AI agents. It supports dual-mode retrieval: classic RAG and LLM-based direct file reading.
Most multimodal memory systems either embed everything into vectors or treat non-text data as attachments. These work, but at scale it becomes hard to explain why certain context was retrieved and what evidence it relies on.
memU takes a different approach: since models reason in language, multimodal memory should converge into structured, queryable text, while remaining fully traceable to original data.
---
## Three-Layer Architecture
- Resource Layer Stores raw multimodal data as ground truth. All higher-level memory remains traceable to this layer.
- Memory Item Layer Extracts atomic facts from raw data and stores them as natural-language statements. Embeddings are optional and used only for acceleration.
- Memory Category Layer Aggregates items into readable, theme-based memory files (e.g. user preferences, work logs). Frequently accessed topics stay active; low-usage content is demoted to balance speed and coverage.
---
## Memorization Bottom-up and asynchronous. Data flows from resources → items → category files without manual schemas. When capacity is reached, recently relevant memories replace the least used ones.
## Retrieval Top-down. memU searches category files first, then items, and only falls back to raw data if needed. At the item layer, it combines BM25 + embeddings to balance exact matching and semantic recall, avoiding embedding-only imprecision.
Dual-mode retrieval lets applications choose between: - low-latency embedding search, or - LLM-based direct reading of memory files.
## Evolution Memory structure adapts automatically based on real usage: - Frequently accessed memories remain at the Category layer - Memories retrieved from raw data are promoted upward and linked - Organization evolves from usage patterns, not predefined rules
Goal: keep relevant memories retrievable at the Category layer and minimize latency over time.
---
## A Unified Multimodal Memory Pipeline memU is a text-centered multimodal memory system. Multimodal inputs are progressively converted into interpretable text memory, while staying traceable to original data. This provides stable, high-level context for reasoning, with detailed evidence available when needed—inside a memory structure that evolves through real-world use.