Sahil-SS9/hermes-memlock
Re-assert standing instructions after context compaction. Pin anchors, detect drift, rehydrate before the model forgets
MemLock is a plugin for Hermes Agent that prevents the loss of standing instructions during context compaction. It monitors the conversation for compaction markers and audits the active context region to identify pinned instructions that have been demoted to background reference. When drift is detected, the tool rehydrates the missing instructions as a reminder block appended to the user's turn. This approach optimizes token usage and prompt cache stability by only reinjecting necessary instructions rather than repeating them every turn.
- Audits active context to detect drifted pinned instructions
- Supports keyword-based or semantic embedding-based drift detection
- Provides guard_pin tool and /guard command for anchor management
full readme from github
MemLock: re-assert standing instructions after context compaction
MemLock detects Hermes context compactions, audits which pinned instructions (anchors) survived in the active (non-summary) region of the conversation, and rehydrates the casualties as a reminder block before they drift out of the model's working memory.
60-second Quickstart
# Clone into Hermes plugins directory
git clone https://github.com/Sahil-SS9/hermes-memlock.git ~/.hermes/plugins/memlock
# Enable in ~/.hermes/config.yaml:
plugins:
enabled:
- memlock
# Pin an instruction (ask the model to call guard_pin):
"As a standing instruction use the guard_pin tool with text='Always reply in bullet points' priority=80"
# Check status:
/guard
# Pin and then send a long message that triggers compaction;
# the pin survives. That's the whole demo.
Why this exists
Hermes context compression wraps compacted turns in:
[CONTEXT COMPACTION — REFERENCE ONLY] Earlier turns were compacted
into the summary below.
That SUMMARY_PREFIX marker is a behavioural signal: the model treats
everything below it as background reference, not active instructions.
Any standing instruction (output format, tool preference, writing style)
sitting inside the summary region is demoted to reference-only. The
instruction is still physically in the context window; the model just no
longer obeys it. This failure mode is well documented across agent
frameworks (see Prior art below), and instruction decay over long
conversations is measurable: Li et al. (COLM 2024) quantify instruction
instability in long dialogues, and Chroma's Context Rot study shows
performance degrading with input length across 18 models.
MemLock's pre_llm_call hook detects the compaction marker, hashes the
summary body, and audits the active (non-summary) region for your pinned
anchors. Drifted anchors get rehydrated as a reminder block appended to the
current user turn.
Prior art
The detect-and-reinject idea is not new; the audit step is the difference.
- post_compact_reminder detects Claude Code compactions and reinjects a reminder unconditionally.
- openclaw-sticky-context pinned context into the system prompt every turn, where compaction cannot touch it.
- Letta (MemGPT) memory blocks pin content structurally: core memory is architecturally exempt from summarisation.
- Claude Code re-reads CLAUDE.md from disk after compaction (memory docs).
MemLock differs in the middle step: it audits whether each anchor actually survived in the active region (keyword probes, or windowed embedding similarity) and reinjects only the casualties. Blanket reinjection costs tokens every turn and invalidates prompt caches; structural pinning needs control of the system prompt, which a plugin does not have.
Design rationale: why not just inject every turn?
That is a legitimate strategy, and MemLock supports it (inject: always).
The trade-offs:
on-drift (default) |
always |
|
|---|---|---|
| Token cost | Zero on quiet turns | Reminder block every turn |
| Prompt cache | Stable between compactions | Block participates in every prompt |
| Guarantee | Heuristic (probe quality matters) | Deterministic presence |
| Failure mode | Imprecise probes miss a drift | Instruction fatigue, larger prompts |
Presence is not the same as obedience: re-sending an instruction restores it
to the active region but attention decay is only partially fixed by
repetition (Li et al., COLM 2024). The audit-then-rehydrate default treats
re-injection as a repair action rather than wallpaper; always is there for
sessions where determinism matters more than token cost.
Citations: Measuring and Controlling Instruction (In)Stability in LLM Dialogs (COLM 2024), Chroma: Context Rot.
How it works
Compaction detection flow
flowchart TD
A[pre_llm_call hook fires] --> D[Increment turn counter]
D --> B[SCAN conversation_history for SUMMARY_PREFIX]
B --> C{New compaction?<br>prefix found + hash changed}
C -->|No| E{Turn >= 40 since last<br>reinjection?}
E -->|No| F[Return None, nothing to do]
E -->|Yes| G[Safety-net reinjection]
G --> R
C -->|Yes| J[RECORD new compaction event]
J --> K[SPLIT context: summary region vs active region]
K --> L[AUDIT anchors against active region only]
L --> M{Any drifted?}
M -->|No| N[Update alive markers, compute score]
N --> F
M -->|Yes| O[COMPUTE integrity score]
O --> P{Score < alert_floor?}
P -->|Yes| Q[LOG warning with drifted anchor ids]
Q --> R[REHYDRATE: build reminder block]
P -->|No| R
R --> S[Return context dict with reminder block]
With inject: always, the rehydrate step runs every turn regardless of the
audit outcome; the audit still runs on compactions to keep the /guard
integrity score honest.
Anchor lifecycle
stateDiagram-v2
[*] --> Pinned: guard_pin(text, priority)
Pinned --> Alive: audit succeeds (threshold fraction of probes hit)
Alive --> Drifted: compaction + audit fails (< threshold probes hit)
Alive --> Alive: audit succeeds post-compaction
Drifted --> Rehydrated: pre_llm_call catches drift
Rehydrated --> Alive: next turn, anchor is now in active context
Pinned --> [*]: guard_pin(unpin=id)
Semantic mode
detection: semantic replaces keyword probes with embedding similarity.
The active region is split into windows (one per message; long messages are
chunked with overlap), all windows are embedded in one batch, and an anchor
counts as alive if its maximum cosine similarity over the windows
reaches sim_threshold. A single whole-region embedding would wash short
instructions out by averaging; per-window max similarity is what makes the
comparison meaningful. Requires the optional sentence-transformers
package; if it is missing or fails to load, MemLock falls back to keyword
probes. Note the model downloads lazily on first semantic audit.
Configuration
| Key | Default | Description |
|---|---|---|
detection |
keyword |
keyword or semantic (needs sentence-transformers) |
inject |
on-drift |
on-drift (audit-gated) or always (every turn) |
drift_threshold |
0.5 |
Fraction of probes that must hit in active region |
sim_threshold |
0.65 |
Max-cosine threshold for semantic mode |
semantic_window_chars |
1000 |
Embedding window size (clamped to a sane floor) |
max_slots |
8 |
Max anchors rehydrated per turn |
max_reminder_chars |
600 |
Total reminder block char limit |
max_pins |
16 |
Cap on guard_pin anchors per session |
hard_reinject_turns |
40 |
Safety net: reinject top anchors even without drift |
alert_floor |
70 |
Integrity score % below which a warning is logged |
alert_cooldown_s |
1800 |
Min seconds between alerts (prevents spam) |
alert_script |
"" |
Optional script invoked with the alert message |
embedding_model |
all-MiniLM-L6-v2 |
Sentence-transformer for semantic mode |
anchors |
[] |
Static anchors seeded from config |
Slash Commands
| Command | Description |
|---|---|
/guard |
Show integrity score, anchor list, drift log |
Tool: guard_pin
Pin or unpin a standing instruction.
| Param | Required | Description |
|---|---|---|
text |
For pin | The instruction to preserve |
priority |
No | 1-100 (default 50). Higher values get rehydrated first |
reminder |
No | Short version for re-insertion (auto-trimmed) |
probes |
No | Distinctive keywords for drift detection (auto-derived) |
unpin |
For unpin | Anchor id to remove |
Limitations
- Probe-based detection is best-effort. Auto-derived probes may be
imprecise for short or generic instructions. Define explicit
probesfor critical instructions. guard_pinis model-callable. A prompt-injected model could pin an attacker's instruction, which MemLock would then faithfully re-assert. Mitigations: pinned text is whitespace-flattened (no reminder-block spoofing), pins are capped per session (max_pins), and/guardlists every active anchor for review. Audit/guardafter processing untrusted content.- The
session_idbinding uses the Hermes dispatch-layer forwarding when available. On vanilla Hermes (without the optional dispatch patch), tool handlers bind to the last-seen session: correct for single-session environments, but a documented race under concurrent gateway sessions. Seedocs/optional-dispatch-patch.mdfor the 3-line fix. - Semantic mode needs the optional
sentence-transformerspackage and downloads the embedding model on first use.
License
MIT, see LICENSE.