AxDSan/mnemosyne
The Zero-Dependency, Sub-Millisecond AI Memory System for Hermes Agents
Mnemosyne is a local-first memory system designed to provide private, low-latency context for Hermes Agents. It utilizes a SQLite-backed architecture featuring native vector search and FTS5 full-text search to eliminate the need for external databases or API keys. The system implements a three-tier BEAM architecture that manages hot working memory, long-term episodic memory, and a temporary scratchpad. It includes automatic consolidation processes that summarize and move old memories to long-term storage during sleep cycles.
- Sub-millisecond read and write latency using local SQLite storage
- Hybrid search combining vector similarity, keyword ranking, and importance scores
- BEAM architecture for tiered working, episodic, and scratchpad memory
full readme from github
Mnemosyne

Native, zero-cloud memory for AI agents. SQLite-backed. Sub-millisecond. Fully private.
Mnemosyne is a local-first memory system for the Hermes Agent framework. It stores conversations, preferences, and knowledge in SQLite with native vector search (sqlite-vec) and full-text search (FTS5) — no external databases, no API keys, no network calls.
Quick Start
Option A: Install from PyPI (recommended)
pip install mnemosyne-memory
Note: The package name on PyPI is
mnemosyne-memory.
With all optional features (dense retrieval + local LLM consolidation):
pip install mnemosyne-memory[all]
⚠️ Ubuntu 24.04 / Debian 12 users: If you get
error: externally-managed-environment, your system Python is PEP 668-protected. Use a virtual environment:python3 -m venv .venv source .venv/bin/activate pip install mnemosyne-memory[all]Make sure to activate the venv every time you run Hermes, or install Hermes itself inside the same venv.
Option B: Install from source (for development)
git clone https://github.com/AxDSan/mnemosyne.git
cd mnemosyne
pip install -e ".[all,dev]"
Option C: Hermes MemoryProvider only (no pip needed)
If you only need Mnemosyne as a Hermes memory backend and want to skip pip entirely:
curl -sSL https://raw.githubusercontent.com/AxDSan/mnemosyne/main/deploy_hermes_provider.sh | bash
This symlinks the provider into ~/.hermes/plugins/mnemosyne and adds the repo to sys.path at runtime. No virtual environment required — works out of the box on Ubuntu 24.04.
Register with Hermes
# 1. Install the plugin
python -m mnemosyne.install
# 2. Activate as your memory provider
hermes memory setup
# → Select "mnemosyne" and press Enter
Verify:
hermes memory status # Should show "Provider: mnemosyne"
hermes mnemosyne stats # Shows working + episodic memory counts
Note: The
hermes memory setuppicker defaults to "Built-in only" every time it opens. This is normal Hermes UI behavior — your previous selection is saved. Just select Mnemosyne and press Enter.
What Makes It Different
| Mnemosyne | Cloud alternatives | |
|---|---|---|
| Latency | < 1ms | 10-100ms |
| Dependencies | Python stdlib + optional ONNX | External APIs, auth, rate limits |
| Privacy | 100% local | Data leaves your machine |
| Cost | Free | Freemium / per-call |
| Setup | pip install -e . |
API keys, accounts, config |
Key capabilities:
- BEAM architecture — Three tiers: hot working memory, long-term episodic memory, temporary scratchpad
- Hybrid search — 50% vector similarity + 30% FTS5 rank + 20% importance, all inside SQLite
- Automatic consolidation — Old working memories are summarized and moved to episodic memory via
mnemosyne_sleep() - Temporal triples — Time-aware knowledge graph with automatic invalidation
- Export / import — Move your entire memory database to a new machine with one JSON file
- Cross-session scope —
remember(..., scope="global")makes facts visible everywhere - Configurable compression —
float32(default),int8(4x smaller), orbit(32x smaller) vectors
Benchmarks
All numbers measured on CPU with sqlite-vec + FTS5 enabled.
LongMemEval (ICLR 2025)
| System | Score | Notes |
|---|---|---|
| Mnemosyne (dense) | 98.9% Recall@All@5 | Oracle subset, 100 instances, bge-small-en-v1.5 |
| Mempalace | 96.6% Recall@5 | AAAK + Palace architecture |
| Mastra Observational Memory | 84.23% (gpt-4o) | Three-date model |
| Full-context GPT-4o baseline | ~60.2% | No memory system |
Latency vs. Cloud Alternatives
| Operation | Honcho | Zep | MemGPT | Mnemosyne | Speedup |
|---|---|---|---|---|---|
| Write | 45ms | 85ms | 120ms | 0.81ms | 56x |
| Read | 38ms | 62ms | 95ms | 0.076ms | 500x |
| Search | 52ms | 78ms | 140ms | 1.2ms | 43x |
| Cold Start | 500ms | 800ms | 1200ms | 0ms | Instant |
BEAM Architecture Scaling
Write throughput:
| Operation | Count | Total | Avg |
|---|---|---|---|
| Working memory writes | 500 | 8.7s | 17.4 ms |
| Episodic inserts (with embedding) | 500 | 10.7s | 21.3 ms |
| Sleep consolidation | 300 old items | 33 ms | — |
Hybrid recall scaling (query latency stays flat as corpus grows):
| Corpus Size | Query | Avg Latency | p95 |
|---|---|---|---|
| 100 | "concept 42" | 5.1 ms | 6.9 ms |
| 500 | "concept 42" | 5.0 ms | 5.7 ms |
| 1,000 | "concept 42" | 5.3 ms | 6.5 ms |
| 2,000 | "concept 42" | 7.0 ms | 8.6 ms |
Working memory recall scaling (FTS5 fast path):
| WM Size | Query | Avg Latency | p95 |
|---|---|---|---|
| 1,000 | "concept 42" | 2.4 ms | 3.1 ms |
| 5,000 | "domain 7" | 3.2 ms | 3.8 ms |
| 10,000 | "concept 42" | 6.4 ms | 7.2 ms |
Installation
Prerequisites
- Python 3.9+
- Hermes Agent (for plugin integration)
From PyPI (recommended for users)
pip install mnemosyne-memory
# With all extras (dense retrieval + local LLM consolidation)
pip install mnemosyne-memory[all]
From source (recommended for contributors)
git clone https://github.com/AxDSan/mnemosyne.git
cd mnemosyne
pip install -e ".[all,dev]"
python -m mnemosyne.install
⚠️ Ubuntu 24.04 / Debian 12 users: If
pip installfails withexternally-managed-environment, see the Quick Start → Option A note about using a virtual environment.
Optional dependencies
# Dense retrieval (required for semantic search and the 98.9% LongMemEval score)
pip install fastembed>=0.3.0
# Local LLM consolidation (sleep cycle summarization)
pip install ctransformers>=0.2.27 huggingface-hub>=0.20
Note: Without
fastembed, Mnemosyne falls back to keyword-only retrieval. It still works, but you won't get competitive semantic search or the benchmark scores above.
Uninstall
python -m mnemosyne.install --uninstall
Updating
If you installed from PyPI:
pip install --upgrade mnemosyne-memory
If you installed from source:
cd mnemosyne
git pull
pip install -e ".[all,dev]"
Always restart Hermes after updating so plugin changes take effect:
hermes gateway restart
If the update includes database schema changes, run the migration helper:
python scripts/migrate_from_legacy.py
See UPDATING.md for detailed troubleshooting and rollback instructions.
Usage
CLI
# Show memory statistics (current session only)
hermes mnemosyne stats
# Show memory statistics across ALL sessions
hermes mnemosyne stats --global
# Search memories
hermes mnemosyne inspect "dark mode preferences"
# Run consolidation (compress old working memory into episodic summaries)
hermes mnemosyne sleep
# Export all memories to a JSON file
hermes mnemosyne export --output mnemosyne_backup.json
# Import memories from a JSON file
hermes mnemosyne import --input mnemosyne_backup.json
# Clear scratchpad
hermes mnemosyne clear
Python API
from mnemosyne import remember, recall
# Store a fact
remember(
content="User prefers dark mode interfaces",
importance=0.9,
source="preference"
)
# Store a global preference (visible in every session)
remember(
content="User email is abdi.moya@gmail.com",
importance=0.95,
source="preference",
scope="global"
)
# Store a temporary credential with expiry
remember(
content="API key: sk-abc123",
importance=0.8,
source="credential",
valid_until="2026-12-31T00:00:00"
)
# Search memories
results = recall("interface preferences", top_k=3)
# Temporal knowledge graph
from mnemosyne.core.triples import TripleStore
kg = TripleStore()
kg.add("Maya", "assigned_to", "auth-migration", valid_from="2026-01-15")
kg.query("Maya", as_of="2026-02-01")
Advanced: BEAM direct access
from mnemosyne.core.beam import BeamMemory
beam = BeamMemory(session_id="my_session")
# Working memory (auto-injected into prompts)
beam.remember("Important context", importance=0.9)
# Episodic memory (long-term, searchable)
beam.consolidate_to_episodic(
summary="User likes Neovim",
source_wm_ids=["wm1"],
importance=0.8
)
# Scratchpad (temporary reasoning)
beam.scratchpad_write("todo: fix auth bug")
# Search both tiers
results = beam.recall("editor preferences", top_k=5)
Architecture
┌─────────────────────────────────────────────────────────────┐
│ HERMES AGENT │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ pre_llm │────▶│ Mnemosyne │────▶│ SQLite │ │
│ │ hook │ │ BEAM │ │ │ │
│ └─────────────┘ └──────────────┘ │ working_mem │ │
│ ▲ │ episodic_mem│ │
│ │ │ vec_episodes│ │
│ └──────── Auto-injected context ───│ fts_episodes│ │
│ │ scratchpad │ │
│ │ triples │ │
│ └─────────────┘ │
│ │
│ No HTTP. No cloud. 100% local. │
└─────────────────────────────────────────────────────────────┘
BEAM (Bilevel Episodic-Associative Memory):
working_memory— Hot context, auto-injected before LLM calls, TTL-based evictionepisodic_memory— Long-term storage with sqlite-vec + FTS5 hybrid searchscratchpad— Temporary agent reasoning workspace
Why SQLite for Hermes?
SQLite is already in your stack. Hermes uses it for session persistence. Mnemosyne extends that same file — no new dependencies, no Docker containers, no connection pooling.
| Feature | Honcho | Zep | Mnemosyne |
|---|---|---|---|
| Deployment | Docker + PostgreSQL | Docker + Postgres | pip install |
| Query Language | REST API | REST API | SELECT ... WHERE MATCH |
| Vector Store | pgvector | pgvector | sqlite-vec |
| Text Search | Separate API | Separate API | Built-in FTS5 |
| Auth Required | Yes (supabase) | Yes | No |
| Offline Mode | No | No | Yes |
| Cold Start Latency | 500-800ms | 800ms+ | 0ms |
Backup, Export & Migration
Mnemosyne stores everything in a single SQLite file at ~/.hermes/mnemosyne/data/mnemosyne.db.
# Simple backup
cp ~/.hermes/mnemosyne/data/mnemosyne.db ~/backups/mnemosyne_$(date +%Y%m%d).db
# Export to JSON (portable across machines)
hermes mnemosyne export --output mnemosyne_backup.json
# Import on a new machine
hermes mnemosyne import --input mnemosyne_backup.json
Environment Variables
| Variable | Default | Description |
|---|---|---|
MNEMOSYNE_DATA_DIR |
~/.hermes/mnemosyne/data |
Database directory |
MNEMOSYNE_VEC_TYPE |
float32 |
Vector compression: float32, int8, or bit |
MNEMOSYNE_WM_MAX_ITEMS |
10000 |
Working memory item limit |
MNEMOSYNE_WM_TTL_HOURS |
24 |
Working memory TTL |
MNEMOSYNE_RECENCY_HALFLIFE |
168 |
Recency decay halflife in hours (1 week) |
Testing
# Run tests locally
python -m pytest tests/test_beam.py -v
# Run benchmarks
python tests/benchmark_beam_working_memory.py
All changes are validated through GitHub Actions CI on Python 3.9–3.12 before merging.
Releases
Mnemosyne publishes GitHub Releases and PyPI packages automatically on every v* tag. See CONTRIBUTING.md for the release process.
Contributing
Contributions are welcome. Areas of active interest:
- Encrypted cloud sync (optional, user-controlled)
- Browser extension for web context capture
- Additional embedding models
- Multi-language support
See CONTRIBUTING.md for guidelines.
License
MIT License — See LICENSE
Copyright (c) 2026 Abdias J
Acknowledgments
- Hermes Agent Framework — The ecosystem Mnemosyne was built for
- Honcho — For defining the stateful memory space
- Mempalace — For proving local-first memory can compete on benchmarks
- SQLite — The world's most deployed database
"The faintest ink is more powerful than the strongest memory." — Hermes Trismegistus