Show HN: Unified multimodal memory framework, without embeddings

Hi HN,

We’ve been building memU(https://github.com/NevaMind-AI/memU), an open-source, general-purpose memory framework for AI agents. It supports dual-mode retrieval: classic RAG and LLM-based direct file reading.

Most multimodal memory systems either embed everything into vectors or treat non-text data as attachments. These work, but at scale it becomes hard to explain why certain context was retrieved and what evidence it relies on.

memU takes a different approach: since models reason in language, multimodal memory should converge into structured, queryable text, while remaining fully traceable to original data.

---

## Three-Layer Architecture

- Resource Layer Stores raw multimodal data as ground truth. All higher-level memory remains traceable to this layer.

- Memory Item Layer Extracts atomic facts from raw data and stores them as natural-language statements. Embeddings are optional and used only for acceleration.

- Memory Category Layer Aggregates items into readable, theme-based memory files (e.g. user preferences, work logs). Frequently accessed topics stay active; low-usage content is demoted to balance speed and coverage.

---

## Memorization Bottom-up and asynchronous. Data flows from resources → items → category files without manual schemas. When capacity is reached, recently relevant memories replace the least used ones.

## Retrieval Top-down. memU searches category files first, then items, and only falls back to raw data if needed. At the item layer, it combines BM25 + embeddings to balance exact matching and semantic recall, avoiding embedding-only imprecision.

Dual-mode retrieval lets applications choose between: - low-latency embedding search, or - LLM-based direct reading of memory files.

## Evolution Memory structure adapts automatically based on real usage: - Frequently accessed memories remain at the Category layer - Memories retrieved from raw data are promoted upward and linked - Organization evolves from usage patterns, not predefined rules

Goal: keep relevant memories retrievable at the Category layer and minimize latency over time.

---

## A Unified Multimodal Memory Pipeline memU is a text-centered multimodal memory system. Multimodal inputs are progressively converted into interpretable text memory, while staying traceable to original data. This provides stable, high-level context for reasoning, with detailed evidence available when needed—inside a memory structure that evolves through real-world use.

Summary

The article discusses the development of memU, an open-source chatbot platform that aims to provide a scalable and secure solution for building conversational AI applications. It highlights the platform's key features, such as modular architecture, support for multiple languages, and advanced natural language processing capabilities.

Story

Show HN: Unified multimodal memory framework, without embeddings