Memory & cross-session recall
Most chatbots are goldfish: open a new chat and they've forgotten everything about you. Hermes is built the other way — to remember. This module unpacks the three layers of memory that carry you from one session to the next, and how each one is wired so you can build on it.
In this module — the goldfish problem, durable notes (MEMORY.md&USER.md), how facts actually get saved, the frozen-snapshot subtlety, full-text search over every past conversation, pluggable memory backends, and why all of this lets you build stateful products.
The goldfish problem
Here's the thing nobody tells you when you first build on top of a language model: the model itself remembers nothing. Every API call is a blank slate. The model only "knows" what's in the text you hand it this turn. Close the window, open a new one, and you're a stranger again.
That's fine for a one-off question. It's miserable for an assistant — something you want to work with for weeks. So a serious agent has to fake continuity by saving things on its own and feeding them back in later. Hermes does this with three distinct layers, each good at a different job:
- Durable notes — a couple of small Markdown files the agent edits and re-reads every session.
- A model of you — who you are and how you like to work.
- Cross-session search — full-text search over the entire archive of past conversations.
Let's take them one at a time.
A way to picture it — Think of a new colleague with no long-term memory but a desk notebook. Layer 1 is the page where they jot down "how this team does things." Layer 2 is the page about you specifically — your name, your style. Layer 3 is the filing cabinet behind the desk holding transcripts of every meeting, searchable. Each session, they glance at the notebook and, when something rings a bell, dig through the cabinet.
Layer 1 — durable notes (MEMORY.md & USER.md)
The simplest layer is two plain Markdown files that live under ~/.hermes/memories/. They are exactly what they look like — text files — which is wonderful, because you can open them in any editor and read or edit them by hand.
| File | Whose notebook | What goes in it |
|---|---|---|
MEMORY.md | The agent's own working notebook | Facts about your environment, your projects, naming conventions, quirks of the tools it uses. ("This repo deploys via Vercel." "The user prefers ripgrep over grep.") |
USER.md | A model of you | Your name, your role, your preferences, your communication style. ("Kevin — product builder. Likes plain English, hates jargon, wants the answer first.") |
The agent doesn't hand-edit these with a text editor the way you would — it uses a dedicated tool. Look at tools/memory_tool.py: it exposes a memory tool with a small set of actions (add, replace, remove) aimed at a target of either memory or user. So "remember that I prefer X" becomes an add against the user target; "actually, scratch that earlier note" becomes a remove or replace against memory.
One design choice matters a lot here: these files are capped. They're deliberately kept small. That's not a limitation that snuck in — it's the whole point. Memory here is meant to be a tight, high-signal summary the agent re-reads every single session, not a junk drawer it dumps everything into. A short, sharp notebook is more useful than a 400-page one nobody can skim.
Why Markdown, of all things — Because it's the lowest-friction format that's readable by both you and the model. You can audit exactly what your agent "believes" about you by opening one file. You can correct it by deleting a line. No database browser, no special tooling — just text. This is the narrow-waist philosophy from Module 1 showing up again: keep it simple at the core.
The "nudge" — how facts actually get saved
A fair question: if the model forgets everything between turns, who decides to write a note down? The README's phrasing is that the agent "nudges itself to persist knowledge." Two things make that real.
First, there's a background memory manager — see agent/memory_manager.py — that syncs memory around turns, keeping the on-disk notes and the running session in step. Second, the agent is periodically reminded to stop and write down anything durable it just learned. So when you say "by the way, always use British spelling," the agent gets nudged to capture that as a fact rather than letting it evaporate when the session ends.
This connects straight back to Module 4. Remember the split:
- Memory holds facts — "the user prefers X," "this project lives at Y."
- Skills hold procedures — "here's how to do the deploy dance step by step."
Facts and procedures together are the two halves of the self-improvement loop. An agent that remembers what's true about your world and learns better ways of doing things is one that genuinely gets more useful the longer you work with it.
The frozen-snapshot subtlety
Here's a detail that trips people up, and it ties directly to the prompt-caching rule we met in Module 2.
Memory in the prompt is a snapshot, not a live feed — The copy of your memory that gets baked into the system prompt is snapshotted at the start of a session and does not change mid-session. If the agent writes a new fact during a conversation, that write lands on disk and shows up in the tool's output immediately — but the version embedded in the system prompt stays frozen until the next session. This is deliberate: rewriting the system prompt mid-session would shatter the prompt cache and make every turn more expensive. It's a cost/consistency tradeoff, not a bug.
In plain terms: a fact the agent just learned is saved and the agent can see it (it's in the tool result it just got back). It simply won't appear in the always-on header of the prompt until you start fresh. If you ever think "I told it that ten minutes ago, why isn't it acting like a permanent fact yet?" — this is why. Start a new session and it'll be right there in the header.
Layer 3 — search over every past conversation
This is the powerful one. Every conversation Hermes has ever had is stored in a SQLite database at ~/.hermes/state.db, indexed for full-text search (SQLite's FTS5). That means the agent isn't limited to the tiny summary in its notebook — it can go dig through the actual transcripts.
The tool that does this is tools/session_search_tool.py. It lets the agent search its own past conversations in three modes:
| Mode | What it's for |
|---|---|
| Keyword query | Discovery — "have we ever talked about the staging deploy?" Finds matching messages anywhere in history. |
| Scroll around a hit | Context — once it finds a relevant message, read the messages just before and after it to understand what was actually being discussed. |
| Browse recent sessions | Recency — just list the latest conversations, no search term needed. |
A nice touch: search results don't come back as a bare matching line. Each result includes the snippet that matched plus "bookends" — the start and the end of that conversation — so the model gets the gist of the whole exchange, not just one decontextualized sentence.
The one-liner for this layer — It's Ctrl-F over your entire chat history — except the agent runs it on itself. When something feels familiar, it can search its own past and pull the relevant thread back into the present conversation.
Pluggable memory backends
Everything above describes the built-in memory — the Markdown files plus session search — which is the default and works out of the box. But memory is also a proper extension point. If you want a smarter brain, you can swap one in.
The seam is an abstract MemoryProvider class in agent/memory_provider.py. It defines the interface; concrete providers implement it. You can run at most one external provider at a time. The notable options:
- Honcho — does dialectic user modeling: instead of static notes, it builds an evolving "theory of mind" about the user that sharpens over time. (Honcho docs.)
- Mem0, Hindsight, and others — alternative external memory services you can plug in through the same interface.
This is squarely a developer hook. We'll come back to writing your own memory provider plugin in Module 7 — for now, just know the door exists and it's a clean, single-interface door.
Putting the three layers together
Three layers, three jobs. The skill is knowing which one answers which kind of question:
+------------------------------------------------------------+
| LAYER 1 · MEMORY.md the agent's working notebook |
| good at: durable facts about your world & tools |
| re-read: every session, baked into the prompt |
+------------------------------------------------------------+
| LAYER 2 · USER.md a model of you |
| good at: name, role, preferences, style |
| re-read: every session, baked into the prompt |
+------------------------------------------------------------+
| LAYER 3 · state.db full-text search of all chats |
| good at: "did we ever discuss X?" — deep recall |
| read: on demand, when the agent goes looking |
+------------------------------------------------------------+
|
v
an assistant that knows you across weeks, not one window
The first two are always present — small, summarized, in the prompt header every session. The third is pulled in on demand — unlimited depth, but only when the agent decides to search. Summaries for the everyday; the full archive when it needs to dig.
Builder's angle: stateful products
Here's why this matters when you build something. Most AI products are stateless: a request goes out, an answer comes back, and the next request starts cold. That's why so many "AI assistants" feel shallow — they can't accumulate a relationship.
Because memory and sessions are first-class in Hermes — a model of the user, durable notes, and a searchable history of every interaction — you can build an assistant that genuinely knows a person over weeks: their projects, their preferences, the decisions you made together three Tuesdays ago. That's the difference between a stateless API call and a product someone gets attached to. The plumbing is already here; you mostly just have to let it work and, when you need more, swap in a richer provider.
Key takeaways
- Hermes has three memory layers: durable notes (
MEMORY.md), a model of you (USER.md), and full-text search over every past conversation (state.db). - The notes are plain Markdown under
~/.hermes/memories/, edited via thememorytool (add/replace/remove) and kept deliberately small. - The agent nudges itself to persist durable facts; memory = facts, skills = procedures (the Module 4 loop).
- Memory in the system prompt is a frozen snapshot per session — to protect the prompt cache. New writes land on disk now, appear in the header next session.
- Session search is Ctrl-F over your whole chat history, with snippet + bookends for context.
- Memory is pluggable: swap in one external provider (Honcho, Mem0, Hindsight…) via
MemoryProvider.
Quick check — You tell Hermes "always call me by my first name," and a moment later it still uses your full name. Is memory broken?
Answer: No — this is the frozen-snapshot behavior. The preference was saved to disk (and is visible in the memory tool's output), but the copy baked into the system prompt was snapshotted at the start of this session and won't update mid-session — that's deliberate, to protect the prompt cache. Start a new session and the first-name preference will be in the prompt header from turn one.
Pair with your AI — Trace the memory machinery in the real repo: "In github.com/NousResearch/hermes-agent, walk me through how a fact gets remembered. Start at tools/memory_tool.py (the add/replace/remove actions and the memory vs user targets), then agent/memory_manager.py (how it syncs around turns), and finally show me where MEMORY.md/USER.md get read back into the system prompt. Explain the 'frozen snapshot' behavior in terms of prompt caching. I'm a product builder, keep it conceptual."