New Read the engineering deep dive on building MemoryOS

M Open Source · MIT License

Persistent memory
for AI agents

MemoryOS gives your agents queryable memory across sessions, with temporal awareness, a knowledge graph, and sub-100ms retrieval. Self-hostable. Production ready.

Open Dashboard Star on GitHub

pip install memoryos

79ms Fast query (warm p50)

82% LongMemEval accuracy

$0 Self-hosted, no per-query cost

memoryos.oxide.fun/app

Alice Chen

TechCo

Seattle

Boston

Bob Martinez

Sequoia

MIT

Sarah Kim

Stripe

Benchmarks

Tested on LongMemEval-s

100 questions across 6 categories from the ICLR 2025 benchmark, each with ~50 conversation sessions in the haystack.

Single-session (User)

100%

Direct recall of user statements

Preference

96%

Persistent style & preferences

Knowledge Update

94%

Tracking facts that change

Temporal Reasoning

88%

"Where did Alice live in 2022?"

Multi-session

72%

Synthesis across sessions

Overall

82%

Across all six categories

Architecture

Built for real agent workloads

A temporal knowledge graph layered on hybrid vector retrieval. Designed for the problems standard RAG pipelines cannot solve.

Temporal knowledge graph

Facts stored with timestamps, never deleted, only superseded. Ask "where did Alice live in 2022?" and get the historically correct answer.

Hybrid retrieval

Four signals combined: raw similarity, enriched similarity, BM25 keywords, and graph proximity. Each tunable per tenant.

Sliding window enrichment

Pronouns resolved before embedding. "I moved" becomes "Alice Chen moved to Seattle." Each chunk gets its own enriched vector.

Ebbinghaus memory decay

Stale memories fade naturally. Retrieval reinforces stability. Superseded facts decay immediately. Nothing is ever deleted.

Full observability

Every retrieval returns a score breakdown. See which signal fired, what the decay score was, why each memory was retrieved.

Single-database architecture

PostgreSQL handles relational metadata, pgvector HNSW for ANN search, and graph edges in one transaction. One system to operate.

Pipeline

How memory works

MemoryOS sits between your agent and the LLM. It retrieves the right context. The LLM does the reasoning.

Ingest

Raw text goes in. spaCy extracts entities. LLM enriches each chunk. Triples become graph edges.

Retrieve

Query is embedded and scored. Graph traversal finds entity-linked memories. Top candidates reranked.

Inject

Top-k memories injected into your LLM prompt as context. Agent now knows the right facts.

Persist

Each retrieval reinforces memory stability. New facts supersede old ones via append-only graph.

Engineering

From the blog

A technical deep dive into the architecture decisions and performance journey.

12 min read

Building a Production-Grade AI Agent Memory System

Why vector databases alone fail for conversational memory, how the temporal knowledge graph works, and the optimization journey from 28-second queries to 79ms warm-path retrieval.

Read the article

📖

Get started

Self-hostable in minutes

One docker compose up and you're running.

Open Dashboard View Source

Persistent memoryfor AI agents