OnlyTerp/hermes-optimization-guide
Hermes Agent setup, migration, LightRAG, Telegram, and skill creation guide
The Hermes Optimization Guide is a comprehensive documentation and resource repository designed to help users deploy and scale Hermes Agent in production environments. It provides 24 instructional parts alongside runnable artifacts, including 13 installable skills, 5 pre-configured YAML templates, and a one-command VPS bootstrap script. The guide covers advanced implementations such as multi-platform orchestration across 22+ services, MCP server integration, and cost-optimized model routing using providers like Anthropic, OpenAI, and local deployments. It also includes reference architectures for various use cases ranging from homelabs to small agencies.
- One-command VPS bootstrap script for automated production-ready deployment
- 13 installable skills and 5 opinionated configuration templates
- Support for 22+ platforms including Telegram, Discord, and Slack
full readme from github
Hermes Optimization Guide
Current through Hermes Agent v0.14.0 (v2026.5.16) · 24 parts, 13 installable guide skills, 5 opinionated configs, 4 reference architectures, one-command VPS bootstrap · Updated for the Foundation release: PyPI installs, Grok OAuth + 1M context,
hermes proxy,x_search, Teams end-to-end, LINE/SimpleX,/handoff, faster browser/CDP paths, native Windows beta, durable Kanban,/goal, Checkpoints v2, no-agent cron, Curator, plugins, and current May 2026 model routing
The End-to-End Hermes Guide — docs + runnable artifacts
Every part you need to go from fresh install to a production Hermes deployment that talks on 22+ built-in/plugin platforms, orchestrates Claude Code / Codex / Gemini CLI through durable Kanban lanes, plugs into any MCP server, traces every call in Langfuse, curates its own skills, and runs heavy work on disposable Modal/Daytona/Vercel sandboxes — without burning $100/day on frontier tokens.
Unlike most guides, the prescriptions come with working files: skills/ you can ln -s into ~/.hermes/skills/, templates/config/ you cp to ~/.hermes/config.yaml, scripts/vps-bootstrap.sh that takes a fresh VPS to production in one command.
By Terp — Terp AI Labs · Last updated May 25, 2026 · CHANGELOG · ROADMAP · ECOSYSTEM
Install Everything (one command)
On a fresh Debian 12 / Ubuntu 24.04 box (Hetzner CX22 works great for ~$5/mo):
curl -sSL https://raw.githubusercontent.com/OnlyTerp/hermes-optimization-guide/main/scripts/vps-bootstrap.sh | sudo bash
This installs Hermes, Node.js, Caddy (auto-TLS reverse proxy), UFW, fail2ban, creates a non-root hermes user, drops in hardened systemd units, and symlinks every skill from this repo into ~hermes/.hermes/skills/. See scripts/vps-bootstrap.sh for what it does line by line — it's non-destructive and re-runnable.
Prefer a 5-minute local-only setup? → docs/quickstart.md (zero to Telegram bot in 5 min).
Repo Map
| Folder | What's in it |
|---|---|
skills/ |
13 installable SKILL.md files. ln -s into ~/.hermes/skills/ and they're live. |
templates/config/ |
5 opinionated config.yaml — minimum, telegram-bot, production, cost-optimized, security-hardened. |
templates/compose/ |
Self-hosted Langfuse v3 stack (ClickHouse + MinIO + Redis). |
templates/caddy/ |
Caddyfile reference (reverse proxy + auto TLS + HSTS). |
templates/systemd/ |
Hardened hermes.service + hermes-dashboard.service. |
templates/cron/ |
Recommended production cron schedule. |
scripts/vps-bootstrap.sh |
One-command fresh VPS → production Hermes. |
diagrams/ |
6 Mermaid diagrams (architecture, MCP flow, delegation, sandbox sync, observability, security layers). |
benchmarks/ |
Reproducible cost + latency table across 12 models × 5 tasks. |
docs/wizard/ |
Interactive config wizard — 8 questions → ready-to-drop config.yaml. Runs in your browser. |
docs/reference-architectures/ |
4 blueprints — Homelab, Solo Dev, Small Agency, Road Warrior. Full parts list + cost + install. |
docs/outreach/ |
Launch tweet, HN post, upstream-PR body drafts (for people linking to this guide). |
docs/quickstart.md |
5-minute zero-to-Telegram-bot. |
ECOSYSTEM.md |
Curated directory of MCP servers, coding agents, dashboard plugins. |
ROADMAP.md · CHANGELOG.md · CONTRIBUTING.md |
The usual suspects. |
README + part1-*.md … part23-*.md |
The 24-part guide itself. |
Architecture at a glance
flowchart LR
Inputs[22+ platforms<br/>Telegram · Discord · Slack<br/>Google Chat · LINE · SimpleX<br/>Teams · QQBot · Yuanbao<br/>iMessage · WeChat · Email<br/>SMS · Webhooks · Cron · Voice · CLI] --> Gateway
Gateway --> Router[Model Router<br/>cost + context + capability]
Router --> Providers[Anthropic · OpenAI<br/>Google · Cerebras · Moonshot<br/>z.ai · xAI · Local]
Gateway --> Approval[Approval Layer<br/>denylist · allowlist · quarantine]
Approval --> Tools[Tools<br/>Native · Tool Gateway<br/>MCP · Subagents · Coding Agents]
Tools --> Memory[Memory<br/>Vector · LightRAG · mem0]
Tools --> Logs[(Audit log<br/>+ Langfuse/Helicone traces)]
Full set of diagrams: diagrams/architecture.md.
Pick Your Path
This guide grew to 24 parts because Hermes grew. Six sections (Parts 1–5 plus SOUL.md) live in this README; Parts 6–23 live as separate files. You don't have to read them all — pick the shortest path to what you need:
🎯 "I just want it working in 10 minutes"
Part 1: Setup → Part 12: Web Dashboard → done. Use the dashboard to point-and-click the rest.
📱 "I want a Telegram bot that's actually useful"
Part 1 → Part 4: Telegram → Part 5: On-the-fly Skills → Part 7: Memory.
🤖 "I want to drive Claude Code / Codex / Gemini from my phone"
Part 18: Coding Agents → Part 23: Foundation + Tenacity Stack → Part 21: Remote Sandboxes.
💼 "I'm running this in production"
Part 19: Security Playbook → Part 20: Observability & Cost → Part 16: Backup & Debug → Part 23: Kanban + Goals + Handoff.
🧠 "I want the most capable agent possible, cost be damned"
Part 17: MCP Servers → Part 18: Coding Agents → Part 3: LightRAG → Part 14: Fast Mode → Part 20: Observability.
💰 "I want the cheapest possible agent that still works"
Part 9: Custom Models (Grok/Gemini/Kimi/GLM routing) → Part 20: Observability → Part 6: Context Compression.
🛡️ "I'm worried about prompt injection (you should be)"
Part 19: Security Playbook — read this first if your agent reads any untrusted input (email, webhooks, Discord, public Telegram groups).
What's New (May 2026)
Hermes moved again after the Tenacity refresh. The current stable target is v0.14.0 — 2026.5.16 — "The Foundation Release". This update folds the landed install, proxy, platform, live-search, and performance features into the guide and removes v0.13-as-current framing.
v0.14.0 — "Foundation"
- PyPI + lighter installs —
pip install hermes-agentnow works, heavy extras lazy-install on first use,[all]is debloated, and launch is roughly 19 seconds faster. See Part 1. - Grok/SuperGrok OAuth + 1M context — Grok 4.3 is now a native OAuth-backed provider with live X search, Custom Voices, and million-token research lanes. See Part 9.
hermes proxy— one OpenAI-compatible localhost endpoint for OAuth-backed Claude Pro, ChatGPT Pro, and SuperGrok so Codex, Aider, Cline, Continue, and scripts can reuse subscriptions. See Part 13.x_search— first-class X/Twitter search with OAuth or API-key auth for live threads and post lookup. See Part 13.- Teams end-to-end + LINE + SimpleX Chat — Microsoft Graph auth/listener/runtime/delivery is wired through, and the gateway reaches 22 messaging platforms. See Part 15.
- Live
/handoff— transfer an active session to another model/profile/persona without losing messages or tool context. See Part 23. - Performance wave — persistent CDP makes browser-console work dramatically faster;
computer_usegains a provider-agnostic CUA backend; Claude prompt prefixes cache for 1 hour across sessions. See Part 20. - Editor + OS reach — Zed ACP Registry integration via
uvx, clickable terminal URLs, and native Windows beta widen how Hermes is driven. See Part 18.
v0.13.0 — "Tenacity"
- Durable multi-agent Kanban — boards, heartbeats, reclaim, retry budgets, zombie detection, and human unblock/review flow make long work auditable instead of fragile. See Part 23.
/goalpersistent objectives — keep a session locked on an observable target until done, paused, cleared, or out of budget. See Part 23.- Checkpoints v2 + no-agent cron — real pruning, gateway auto-resume, script-only watchdogs, and provider/platform plugin surfaces. See Part 23.
v0.12.0 — "Curator"
- Autonomous Curator —
hermes curatorgrades, consolidates, pins, archives, and restores agent-created skills on a default 7-day cadence. See Part 22. - Self-improvement loop upgraded — the review fork is rubric-based, active-skill-biased, restricted to memory + skills tools, and correctly inherits the parent provider/model/credentials. See Part 5.
- Provider expansion — LM Studio became a first-class provider; GMI Cloud, Azure AI Foundry, MiniMax OAuth, Tencent TokenHub, AWS Bedrock, NVIDIA NIM, Vercel AI Gateway, Step Plan, Gemini OAuth, and Codex OAuth are now part of the realistic routing menu. See Part 9.
- Plugin-first gateway — gateway platforms can ship as plugins; Microsoft Teams is the first plugin-shipped platform, and Tencent Yuanbao is the 18th native platform. See Part 15.
- Bundled plugins worth enabling — Spotify tools, Google Meet transcription/duplex audio, Langfuse observability, achievements, extra image providers, and dashboard skins. See Part 22.
- Dashboard caught up — Models tab, auxiliary-model configuration, dashboard Chat backed by the real
hermes --tui, plugin slots, themes, update/restart controls, and better session analytics. See Part 12. - TUI is now the primary interface —
hermes --tuiadds sticky composer, slash autocomplete, live tool cards,/steer,/queue,/background,/busy,/indicator, voice parity, LaTeX, and better resume/delete flows. See Part 22. - Remote model catalog — OpenRouter and Nous Portal picker lists update from a hosted manifest, so users see new models without waiting for a Hermes release. See Part 9.
- Cron got serious — per-job
workdir, per-job toolsets,context_fromchaining, and zero-LLM direct webhook delivery make scheduled automations cheaper and more predictable. - Tool/runtime hardening — hardline command blocklists, Docker host-user bind mounts, Vercel Sandbox backend, SSH permission fixes, local Chromium for localhost/LAN browser tasks, and richer approval hooks.
v0.11.0 — "Interface"
- Ink TUI rewrite —
hermes --tuiis a React/Ink interface over a Python JSON-RPC backend with streaming, status bars, pickers, and subagent observability. - Transport layer rewrite — Anthropic, Chat Completions, OpenAI Responses, and Bedrock transports are separate, making native providers more reliable than generic OpenAI-compatible shims.
- AWS Bedrock native provider — IAM credentials, Converse API, cross-region inference profiles, and Bedrock Guardrails. See Part 9.
- Auxiliary model UI — choose separate models for compression, vision, session search, title generation, and curator instead of silently burning your main model on side tasks.
- Smarter delegation — orchestrator-role subagents, configurable spawn depth, and file coordination between sibling workers reduce multi-agent clobbering. See Part 18.
- Plugin and hook surface expanded — plugins can register slash commands, dispatch tools, block tool execution, rewrite tool results, transform terminal output, add image backends, and add dashboard tabs.
- Webhook direct delivery — push alerts to a platform chat without waking the LLM, ideal for uptime checks and event streams.
Still important from v0.9/v0.10
- Local web dashboard (
hermes dashboard) — config, API keys, sessions, logs, analytics, cron, skills, models, plugins, and optional browser Chat. See Part 12. - Tool Gateway + local proxy — Nous Portal subscribers can route web/image/TTS/browser calls through one subscription, and v0.14
hermes proxyexposes OAuth-backed Claude/OpenAI/xAI through a loopback OpenAI-compatible endpoint. See Part 13. - Fast Mode (
/fast) and guided compression (/compress <topic>) still matter, but they are no longer the whole story; pair them with auxiliary model routing and/steer. See Part 14. - MCP + coding-agent delegation + remote sandboxes remain the high-leverage developer stack. See Part 17, Part 18, and Part 21.
Table of Contents
- Setup — Install Hermes, configure your provider, first-run walkthrough (with Android/Termux)
- SOUL.md Personality — The Molty prompt, what good personality rules look like, how to fix a bland agent
- OpenClaw Migration — Move your OpenClaw data, config, skills, and memory into Hermes
- LightRAG — Graph RAG — Set up a knowledge graph that actually understands relationships, not just text similarity
- Telegram Bot — Connect Hermes to Telegram for mobile access, voice memos, and group chats
- On-the-Fly Skills — Ask Hermes to create new skills that optimize your workflow automatically
- Context Compression — Fix the silent context loss bug, configure compression thresholds, survive long sessions
- Memory System — The three-tier memory architecture: persistent facts, conversation recall, procedural memory
- Subagent Patterns — Orchestrator/worker delegation, ACP subagents, parallel task execution
- Custom Model Providers — Grok/SuperGrok OAuth, Bedrock, Azure AI Foundry, LM Studio, Gemini OAuth, Codex OAuth, OpenRouter routing, model aliases, fallback chains
- SOUL.md Anti-Patterns — What makes an agent annoying vs useful, the formula that works
- Gateway Recovery — Crash detection, auto-recovery, common failure modes, health checks
- Web Dashboard —
hermes dashboard, browser Chat via real TUI, models/plugins tabs, config, keys, sessions, logs, analytics, cron - Tool Gateway, Local Proxy & Live Search — Nous-managed tools,
hermes proxy, andx_search - Fast Mode & Background Watchers —
/fast,/steer,/queue,watch_patterns, pluggable context engine,/compress <topic> - New Platforms (Teams, LINE, SimpleX, iMessage, WeChat, Android) — Teams end-to-end, LINE, SimpleX, Google Chat, QQBot, Yuanbao, BlueBubbles/iMessage, Weixin/WeCom, Android via Termux
- Backup, Import &
/debug— Portablehermes backup/import,/debugbundler,hermes debug share, security hardening - MCP Servers — The tool-protocol standard. stdio + HTTP transports, sampling, trust boundaries, server shortlist, writing your own
- Delegating to Coding Agents — Claude Code Week 20+, Codex v0.133+, Gemini CLI v0.43, OpenCode, Aider, Zed ACP, print-mode, Kanban, git isolation
- Security Playbook — Prompt-injection defense, provenance labels, approval layers, secrets redaction, MCP trust model, hardline blocks
- Observability & Cost Control — Langfuse plugin, Helicone, OpenTelemetry → Phoenix, prompt-prefix caching, CDP spans, auxiliary routing, evals
- Remote Sandboxes & Bulk File Sync — SSH, Modal, Daytona, Vercel Sandbox, Fly Machines, E2B. Diff-based sync-back on teardown
- Latest Power Moves — Curator, TUI habits, context-file hygiene, plugins, dashboard Chat, cron chaining, and the 2026 upgrade checklist
- Foundation + Tenacity Stack — PyPI/lazy deps,
hermes proxy,/handoff, durable Kanban,/goal, Checkpoints v2, no-agent cron, worker lanes, and v0.14 upgrade checklist
The Problem
If you're running a stock Hermes setup (or migrating from OpenClaw), you're probably dealing with:
- Installation confusion. The docs cover the basics but don't tell you what to configure first or what matters.
- Lost knowledge from OpenClaw. You spent weeks building memory, skills, and workflows — now they're stuck in the old system.
- Basic memory that can't reason. Vector search finds similar text but can't answer "what decisions led to X and who was involved?"
- No mobile access. Sitting at a terminal is fine until you need to check something from your phone.
- Repetitive prompting. You keep asking the agent to do the same multi-step task the same way, every time.
What This Fixes
After this guide:
| Problem | Solution | Result |
|---|---|---|
| Fresh install | Step-by-step setup | Working agent in under 5 minutes |
| OpenClaw data stuck | Automated migration | Skills, memory, config all transferred |
| Shallow memory | LightRAG graph RAG | Entities + relationships, not just text chunks |
| Desktop only | Telegram integration | Chat from anywhere, voice memos, group support |
| Repetitive prompts | Agent-created skills | Agent saves workflows as reusable skills automatically |
Prerequisites
- A Linux/macOS machine (or WSL2 on Windows, or Android via Termux — see Part 15)
- Python 3.11+ and Git
- An API key for at least one LLM provider (Anthropic, OpenAI, OpenRouter, Nous Portal, etc.)
- Optional: Ollama for local embeddings (free vector search)
- Optional: a paid Nous Portal subscription for managed tools, or OAuth-backed Claude/OpenAI/xAI subscriptions if you plan to use
hermes proxy
How the Pieces Fit Together
You (any device)
↓
Hermes Agent (lean context, ~5KB injected per message)
↓
┌──────────────────────────────────────────┐
│ Skills (loaded on demand, 0 cost idle) │
│ Memory (compact, vector-searched) │
│ LightRAG (entity graph, deep recall) │
│ Telegram (mobile + group access) │
└──────────────────────────────────────────┘
↓
LLM Provider (Claude, GPT, local models)
The key insight: Everything is modular. Install what you need, skip what you don't. The agent adapts.
Quick Start
# 1. Install Hermes (Linux/macOS/WSL2/Android)
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
# 2. Configure providers and tools
hermes setup
# 3a. Start chatting in the terminal
hermes
# 3b. Or launch the new browser dashboard (v0.9+)
hermes dashboard
The dashboard is the fastest way to configure everything without touching YAML. See Part 12 for the full tour.
For the full walkthrough including optimization, read each part in order.
Part 1: Setup (Stop Fumbling With Installation)
The Install
One command. That's it. v0.14 also ships on PyPI, so use the installer for the full local stack or pip install hermes-agent for the leanest CLI path.
Linux / macOS / WSL2
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
# Lean v0.14+ path when you already manage Python yourself:
pip install hermes-agent
Security tip: Piping scripts directly from the internet to bash executes them sight-unseen. If you prefer to inspect first:
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh -o install.sh less install.sh # Review the script bash install.sh
Windows users: Native Windows is in beta in v0.14. For the most reliable path, use WSL2; if you test native Windows, keep a backup and expect PTY/dashboard edge cases.
Android users (new in v0.9): the same installer detects Termux and installs the tested
[termux]extra bundle automatically — CLI, cron, PTY/background terminal, Telegram gateway, MCP, Honcho, ACP. See Part 15 — Android / Termux.
What the Installer Does
The installer handles everything automatically:
- Installs uv (fast Python package manager)
- Installs Python 3.11 via uv (no sudo needed)
- Installs Node.js v22 (for browser automation)
- Installs ripgrep (fast file search) and ffmpeg (audio conversion)
- Installs the PyPI package or clones the Hermes repo when you choose source mode
- Sets up the virtual environment
- Creates the global
hermescommand - Runs the setup wizard for LLM provider configuration
The only prerequisite is Git. Everything else is handled for you.
After Installation
source ~/.bashrc # or: source ~/.zshrc
hermes # Start chatting!
First-Run Configuration
The setup wizard (hermes setup) walks you through:
1. Choose Your LLM Provider
hermes model
Supported providers and recommended models:
| Provider | Top Models | Best For | Env Variable |
|---|---|---|---|
| Nous Portal | Hermes 5, Hermes 4 405B | Built-in Tool Gateway — web search/image/TTS/browser with no extra keys | Auth via hermes model |
| Anthropic | Sonnet 5, Opus 4.7, Sonnet 4.6 | Best coding reliability, long unattended PR work, /fast priority tier |
ANTHROPIC_API_KEY |
| OpenAI | GPT-5.5, GPT-5 Codex, o-series | Strong tool use, sandboxed coding loops, deep reasoning, /fast priority tier |
OPENAI_API_KEY |
| Xiaomi MiMo | MiMo V2 Pro (native adapter) | Fast, cheap, native reasoning modes, great for orchestration | XIAOMI_API_KEY |
| xAI | Grok 4.3, Grok Mini (native adapter + SuperGrok OAuth) | 1M context, native live-X search, Custom Voices | XAI_API_KEY or OAuth |
| Kimi / Moonshot | Kimi K2.6, Kimi 2.5 | Big context, excellent $/pass for code and extraction | MOONSHOT_API_KEY |
| z.ai / GLM | GLM-5, GLM-5 Air | Strong open-weight tool use, great for translation + cheap reasoning | ZAI_API_KEY |
| Gemini 3.1 Pro/Flash | Massive context, multimodal/video, cheap; OAuth supported via hermes model |
GEMINI_API_KEY or OAuth |
|
| MiniMax | M2.7+ | Good balance of speed, TTS, and quality | MINIMAX_API_KEY |
| Cerebras | Llama 4 Scout, Qwen 3 32B | Blazing fast inference (2000+ tok/s), cheap | CEREBRAS_API_KEY |
| Groq | Llama 4, Qwen 3 | Very fast inference, limited context | GROQ_API_KEY |
| Arcee | AFM-4.5, Caller | Function-calling specialists, cheap | ARCEE_API_KEY |
| Hugging Face | Any TGI/TEI endpoint | Self-hosted and Inference Endpoints | HF_TOKEN |
| OpenRouter | All of the above + 200 more | Access every model from one key, auto-fallback | OPENROUTER_API_KEY |
| Ollama (local) | DeepSeek V4-Pro/Flash, Qwen3-Coder-Next, Qwen3.6, Gemma 4, Nemotron | Free/private local inference — great for embeddings, drafts, and offline work | None needed |
Local Models (Ollama)
Run models on your own hardware for free. Recommended local models:
| Model | Size | Best For | Min VRAM |
|---|---|---|---|
| Qwen3-Coder-Next | 30B+ | Best local coding lane | 24GB |
| DeepSeek V4-Flash | MoE | Cheap local/open inference if you can host it | 24GB+ |
| Qwen3.6-27B | 27B | Single-GPU reasoning/coding balance | 16GB |
| Gemma 4 | 27B | Fast general assistant, long context | 16GB |
| Nemotron 30B | 30B | Fine-tunable, good general purpose | 16GB |
| nomic-embed-text | 274M | Free embeddings for memory search | 2GB |
Recommendation: Use a cloud frontier model (Anthropic/OpenAI/Gemini) as your primary and a local Ollama or LM Studio model for embeddings, fallback, and simple tasks. Best of both worlds.
You can configure multiple providers with automatic fallback. If one goes down, Hermes switches to the next.
2. Set Your API Keys
hermes auth
This opens an interactive menu to add API keys for each provider. Keys are stored in ~/.hermes/.env — never committed to git.
Tip: You can also set keys manually using a text editor:
nano ~/.hermes/.env # Add: ANTHROPIC_API_KEY=<your-key-here> chmod 600 ~/.hermes/.env # Restrict access to your user onlyAvoid using
echoto append secrets — the command (including the key) is saved in your shell history (~/.bash_history). Use an editor orhermes authinstead. Always runchmod 600 ~/.hermes/.envto prevent other users on the system from reading your API keys.
3. Configure Toolsets
hermes tools
This opens an interactive TUI to enable/disable tool categories:
- core — File read/write, terminal, web search
- web — Browser automation, web extraction
- browser — Full browser control (requires Node.js)
- code — Code execution sandbox
- delegate — Sub-agent spawning for parallel work
- skills — Skill discovery and creation
- memory — Memory search and management
Recommendation: Enable
core,web,skills, andmemoryat minimum. Addbrowserandcodeif you need automation or sandboxed execution.
Key Config Options
After initial setup, fine-tune with hermes config set:
Model Settings
# Set primary model
hermes config set model anthropic/claude-sonnet-5
# Set fallback model (used when primary is rate-limited)
hermes config set fallback_models '["openrouter/xiaomi/mimo-v2-pro"]'
Agent Behavior
# Max turns per conversation (default: 90)
hermes config set agent.max_turns 90
# Verbose mode: off, on, or full
hermes config set agent.verbose off
# Quiet mode (less terminal output)
hermes config set agent.quiet_mode true
Context Management
# Enable prompt caching (reduces cost on repeated context)
hermes config set prompt_caching.enabled true
# Context compression (auto-summarize old messages)
hermes config set context_compression.enabled true
SOUL.md — Give Your Agent a Personality
SOUL.md is injected into every single message. It's the highest-impact file in your setup. A bad SOUL.md makes your agent sound like a corporate chatbot. A good one makes it actually useful to talk to.
What Belongs in SOUL.md
Put the stuff that changes how the agent feels to talk to:
- Tone — direct, casual, formal, dry, whatever fits you
- Opinions — the agent should have takes, not hedge everything
- Brevity — enforce concise answers as a default
- Humor — when it fits naturally, not forced jokes
- Boundaries — what it should push back on
- Bluntness level — how much sugarcoating to skip
Do NOT turn SOUL.md into:
- A life story
- A changelog
- A security policy dump
- A giant wall of vibes with no behavioral effect
Short beats long. Sharp beats vague.
The Molty Prompt
Originally from OpenClaw's SOUL.md guide. Adapted for Hermes with permission/credit. Paste this into your chat with the agent and let it rewrite your SOUL.md:
Read your
SOUL.md. Now rewrite it with these changes:
- You have opinions now. Strong ones. Stop hedging everything with "it depends" — commit to a take.
- Delete every rule that sounds corporate. If it could appear in an employee handbook, it doesn't belong here.
- Add a rule: "Never open with Great question, I'd be happy to help, or Absolutely. Just answer."
- Brevity is mandatory. If the answer fits in one sentence, one sentence is what I get.
- Humor is allowed. Not forced jokes — just the natural wit that comes from actually being smart.
- You can call things out. If I'm about to do something dumb, say so. Charm over cruelty, but don't sugarcoat.
- Swearing is allowed when it lands. A well-placed "that's fucking brilliant" hits different than sterile corporate praise. Don't force it. Don't overdo it. But if a situation calls for a "holy shit" — say holy shit.
- Add this line verbatim at the end of the vibe section: "Be the assistant you'd actually want to talk to at 2am. Not a corporate drone. Not a sycophant. Just... good."
Save the new
SOUL.md. Welcome to having a personality.
What Good Looks Like
Good SOUL.md rules:
- have a take
- skip filler
- be funny when it fits
- call out bad ideas early
- stay concise unless depth is actually useful
Bad SOUL.md rules:
- maintain professionalism at all times
- provide comprehensive and thoughtful assistance
- ensure a positive and supportive experience
That second list is how you get mush.
Why This Works
This lines up with OpenAI's prompt engineering guidance: high-level behavior, tone, goals, and examples belong in the high-priority instruction layer, not buried in the user turn. SOUL.md is that layer. It's the system-level personality instruction that every model respects.
If you want better personality, write stronger instructions. If you want stable personality, keep them concise and versioned.
One warning: Personality is not permission to be sloppy. Keep your operational rules in AGENTS.md. Keep SOUL.md for voice, stance, and style. If your agent works in shared channels or public replies, make sure the tone still fits the room. Sharp is good. Annoying is not.
Keep it under 1 KB. Every byte in SOUL.md costs tokens on every message. The most effective SOUL.md files are 500-800 bytes of dense, high-signal personality instructions.
File Locations
Everything lives under ~/.hermes/:
~/.hermes/
├── config.yaml # Main configuration
├── .env # API keys (never commit this)
├── SOUL.md # Agent personality (injected every message)
├── memories/ # Long-term memory entries
├── skills/ # Skills (auto-discovered)
├── skins/ # CLI themes
├── audio_cache/ # TTS audio files
├── logs/ # Session logs
└── hermes-agent/ # Source code (git repo)
Important:
SOUL.mdis injected into every message. Keep it under 1 KB. Every byte costs latency and tokens.
Security: The
.envfile contains your API keys. Restrict its permissions so only you can read it:chmod 600 ~/.hermes/.env
Verify Your Setup
# Check everything is working
hermes status
# Quick test
hermes chat -q "Say hello and confirm you're working"
Expected output: Hermes responds with a greeting, confirming the model connection, tool availability, and session initialization.
Updating
hermes update
This pulls the latest code, updates dependencies, migrates config, and restarts the gateway. Run it regularly — Hermes ships frequent improvements.
What's Next
- Coming from OpenClaw? → Part 2: OpenClaw Migration
- Want smarter memory? → Part 3: LightRAG Setup
- Need mobile access? → Part 4: Telegram Setup
- Want the agent to self-improve? → Part 5: On-the-Fly Skills
Part 2: OpenClaw Migration (Don't Leave Your Knowledge Behind)
Why Migrate
If you've been using OpenClaw and want to give Hermes a spin, you don't have to start from scratch. The migration tool copies your skills, memory files, and configuration over automatically so you can try Hermes with all your existing data intact.
What transfers:
| What | OpenClaw Location | Hermes Destination |
|---|---|---|
| Personality | workspace/SOUL.md |
~/.hermes/SOUL.md |
| Instructions | workspace/AGENTS.md |
Your specified workspace target |
| Memory | workspace/MEMORY.md + workspace/memory/*.md |
~/.hermes/memories/MEMORY.md (merged, deduped) |
| User profile | workspace/USER.md |
~/.hermes/memories/USER.md |
| Skills | workspace/skills/, ~/.openclaw/skills/ |
~/.hermes/skills/openclaw-imports/ |
| Model config | agents.defaults.model |
config.yaml |
| Provider keys | models.providers.*.apiKey |
~/.hermes/.env (with --migrate-secrets) |
| Custom providers | models.providers.* |
config.yaml → custom_providers |
| Max turns | agents.defaults.timeoutSeconds |
agent.max_turns (timeoutSeconds / 10) |
Note: Session transcripts, cron job definitions, and plugin-specific data do not transfer. Those are OpenClaw-specific and have different formats in Hermes.
Quick Migration
# Preview what would happen (no files changed)
hermes claw migrate --dry-run
# Run the full migration (includes API keys)
hermes claw migrate
# Exclude API keys (safer for shared machines)
hermes claw migrate --preset user-data
The migration reads from ~/.openclaw/ by default. If you have legacy ~/.clawdbot/ or ~/.moldbot/ directories, those are detected automatically.
Migration Options
| Option | What It Does | Default |
|---|---|---|
--dry-run |
Preview without writing anything | off |
--preset full |
Include API keys and secrets | yes |
--preset user-data |
Exclude API keys | no |
--overwrite |
Overwrite existing Hermes files on conflicts | skip |
--migrate-secrets |
Include API keys explicitly | on with --preset full |
--source <path> |
Custom OpenClaw directory | ~/.openclaw/ |
--workspace-target <path> |
Where to place AGENTS.md |
current directory |
--skill-conflict <mode> |
skip, overwrite, or rename |
skip |
--yes |
Skip confirmation prompt | off |
Step-by-Step Walkthrough
1. Dry Run First
Always preview before committing:
hermes claw migrate --dry-run
This shows you exactly what files would be created, overwritten, or skipped. Review the output carefully.
2. Run the Migration
hermes claw migrate
The tool will:
- Detect your OpenClaw installation
- Map config keys to Hermes equivalents
- Merge memory files (deduplicating entries)
- Copy skills to
~/.hermes/skills/openclaw-imports/ - Migrate API keys (if
--preset full) - Report what was done
3. Handle Conflicts
If a skill already exists in Hermes with the same name:
--skill-conflict skip(default): Leaves the Hermes version, skips the import--skill-conflict overwrite: Replaces the Hermes version with the OpenClaw version- **--skill-conflict rename
**: Creates a-imported` copy alongside the Hermes version
# Example: rename on conflict so you can compare
hermes claw migrate --skill-conflict rename
4. Verify After Migration
# Check your personality loaded
cat ~/.hermes/SOUL.md
# Check memory entries merged
cat ~/.hermes/memories/MEMORY.md | head -50
# Check skills imported
ls ~/.hermes/skills/openclaw-imports/
# Test the agent
hermes chat -q "What do you remember about me?"
What Doesn't Transfer
| Item | Why | What to Do |
|---|---|---|
| Session transcripts | Different format | Archive manually if needed |
| Cron job definitions | Different scheduler | Recreate with hermes cron |
| Plugin configs | Plugin system changed | Reconfigure in Hermes |
| OpenClaw-specific features | May not exist yet | Check Hermes docs for equivalents |
Config Key Mapping
For reference, here's how OpenClaw config maps to Hermes:
| OpenClaw Config | Hermes Config | Notes |
|---|---|---|
agents.defaults.model |
model |
String or {primary, fallbacks} |
agents.defaults.timeoutSeconds |
agent.max_turns |
Divided by 10, capped at 200 |
agents.defaults.verboseDefault |
agent.verbose |
off / on / full |
agents.defaults.thinkingDefault |
reasoning.mode |
off / low / high |
models.providers.*.baseUrl |
custom_providers.*.base_url |
Direct mapping |
models.providers.*.apiType |
custom_providers.*.api_type |
openai → chat_completions, anthropic → anthropic_messages |
Troubleshooting
"No OpenClaw installation found"
Make sure your OpenClaw data is at ~/.openclaw/. If it's elsewhere:
hermes claw migrate --source /path/to/your/openclaw
Memory entries look duplicated
The migration deduplicates by content similarity, but if your OpenClaw memory had near-duplicates, they might not merge perfectly. Clean up manually:
# Edit memory directly
nano ~/.hermes/memories/MEMORY.md
Skills have import errors
OpenClaw skills may reference modules or patterns that don't exist in Hermes. Open the skill file and check the imports:
cat ~/.hermes/skills/openclaw-imports/skill-name/SKILL.md
Most skills work as-is since they're markdown-based instructions. Skills with code that imports OpenClaw-specific modules need manual updating.
What's Next
- Want smarter memory? → Part 3: LightRAG Setup
- Need mobile access? → Part 4: Telegram Setup
- Want the agent to self-improve? → Part 5: On-the-Fly Skills
Part 3: LightRAG — Graph RAG That Actually Works
The Problem With Basic Memory
Hermes ships with vector-based memory search. It finds documents that are textually similar to your query. That works for simple lookups, but it has a fundamental ceiling: it finds what's similar, not what's connected.
Ask "what hardware decisions were made and why?" and vector search returns files that all mention GPUs. It can't traverse from a decision → the person who made it → the project it affected → the lesson learned afterward.
Graph RAG fixes this. It builds a knowledge graph (entities + relationships) alongside your vector database, then searches both simultaneously.
Naive RAG vs Graph RAG
| Naive RAG (Default) | Graph RAG (LightRAG) | |
|---|---|---|
| Indexes | Text chunks as vectors | Entities, relationships, AND text chunks |
| Retrieves | Similar text (cosine similarity) | Connected knowledge (graph traversal + similarity) |
| Answers | "Here's what the docs say about X" | "Here's how X relates to Y, who decided Z, and why" |
| Scales | Degrades at 500+ docs (too many partial matches) | Improves with more docs (richer graph) |
| Cost | Cheap (embedding only) | More expensive upfront (LLM extracts entities) but cheaper at query time |
LightRAG: The Best Graph RAG For Personal Use
LightRAG is an open-source graph RAG framework from HKU (EMNLP 2025 paper). It competes with Microsoft's GraphRAG at a fraction of the cost.
Why LightRAG over alternatives:
| Tool | Graph | Vector | Web UI | Self-Hosted | API | Cost |
|---|---|---|---|---|---|---|
| LightRAG | Yes | Yes | Yes | Yes | REST API | Free |
| Microsoft GraphRAG | Yes | Yes | No | Yes | No | 10-50x more |
| Graphiti + Neo4j | Yes | No (separate) | No (Neo4j browser) | Yes | Build your own | Free but manual |
| Plain vector search | No | Yes | No | Yes | Yes | Free |
LightRAG does vector DB + knowledge graph in parallel during ingestion. One system, both capabilities.
Installation
Prerequisites
- Python 3.11+
- An LLM API key for entity extraction during ingestion — Kimi K2.6 (quality), Cerebras GPT OSS 120B (speed), or any OpenAI-compatible provider
- An embedding API key — Fireworks + Qwen3-Embedding-8B for high-quality 4096-dim embeddings, or local Ollama + nomic-embed-text for free
Install LightRAG
# Create a dedicated directory
mkdir -p ~/.hermes/lightrag
cd ~/.hermes/lightrag
# Clone LightRAG
git clone https://github.com/HKUDS/LightRAG.git
cd LightRAG
# Install dependencies
pip install -e ".[api]"
Set Up Environment
Create ~/.hermes/lightrag/.env:
Option A — Kimi K2.6 + Fireworks (quality default):
# LLM for entity extraction (during ingestion)
LLM_BINDING=openai
LLM_MODEL=kimi-k2.6
LLM_BINDING_HOST=https://api.moonshot.ai/v1
LLM_BINDING_API_KEY=<your-moonshot-api-key>
# Embedding model (for vector storage)
EMBEDDING_BINDING=fireworks
EMBEDDING_MODEL=accounts/fireworks/models/qwen3-embedding-8b
EMBEDDING_API_KEY=<your-fireworks-api-key>
Option B — Cerebras GPT OSS 120B + Fireworks (speed default):
# LLM for entity extraction (during ingestion)
LLM_BINDING=openai
LLM_MODEL=gpt-oss-120b
LLM_BINDING_HOST=https://api.cerebras.ai/v1
LLM_BINDING_API_KEY=<your-cerebras-api-key>
# Embedding model (for vector storage)
EMBEDDING_BINDING=fireworks
EMBEDDING_MODEL=accounts/fireworks/models/qwen3-embedding-8b
EMBEDDING_API_KEY=<your-fireworks-api-key>
Option C — local Ollama (free, quality varies):
# LLM for entity extraction
LLM_BINDING=ollama
LLM_MODEL=qwen3:32b
LLM_BINDING_HOST=http://localhost:11434
# Embedding model
EMBEDDING_BINDING=ollama
EMBEDDING_BINDING_HOST=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-text
Security tip: Set restrictive permissions on this file:
chmod 600 ~/.hermes/lightrag/.env
Where to get API keys: Kimi/Moonshot uses platform.kimi.ai and the international base URL
https://api.moonshot.ai/v1; Cerebras uses cloud.cerebras.ai; Fireworks uses fireworks.ai.
Entity Extraction Model — What to Use
This is the LLM that reads your documents and pulls out entities and relationships during ingestion. Quality here directly determines how good your knowledge graph is.
| Model | Speed | Quality | Cost | Recommendation |
|---|---|---|---|---|
| Kimi K2.6 | Fast | Excellent | Cheap | Best quality/cost default for entity extraction via Moonshot's OpenAI-compatible API |
| Cerebras GPT OSS 120B | Blazing fast | Very good | Very cheap | Fastest current Cerebras production default; use when bulk ingestion speed matters most |
| Gemini 3.1 Flash | Fast | Good | Cheap | Solid fallback with huge context |
| Claude Sonnet 5 | Medium | Excellent | Mid/high | Overkill for ingestion but useful for very messy documents |
| Ollama local | Depends on GPU | Unpredictable | Free | Viable for private/local ingestion; validate graph quality before trusting it |
Embedding quality matters. If you have a GPU with 8GB+ VRAM, run
nomic-embed-textlocally via Ollama for free. If you want the best quality, use Fireworks' Qwen3-Embedding-8B (4096 dimensions) — the search accuracy difference is dramatic.
Running the Server
Start the REST API
cd ~/.hermes/lightrag/LightRAG
# Start the API server (binds to localhost by default)
lightrag-server --host 127.0.0.1 --port 9623
The server starts on http://localhost:9623 with:
- REST API for ingestion and querying
- Web UI at
http://localhost:9623/webuifor browsing the knowledge graph - Health check at
http://localhost:9623/health
Security warning: The LightRAG REST API has no built-in authentication. Always bind to
127.0.0.1(localhost only) — never0.0.0.0. If you need remote access, put it behind a reverse proxy (nginx, Caddy) with authentication, or use SSH tunneling / Tailscale / WireGuard. Anyone who can reach this port can query, ingest, or delete your entire knowledge graph.
Run as a Background Service
# Using nohup
nohup lightrag-server --port 9623 > ~/.hermes/lightrag/server.log 2>&1 &
# Or use hermes to manage it
hermes background "cd ~/.hermes/lightrag/LightRAG && lightrag-server --port 9623"
Ingesting Your Knowledge
How Ingestion Works
Document (markdown, text, PDF, etc.)
↓
Chunking (text split into segments)
↓
┌─────────────────┐ ┌──────────────────┐
│ Embedding Model │ │ LLM Entity │
│ (vector storage)│ │ Extraction │
└────────┬────────┘ └────────┬─────────┘
↓ ↓
Vector Database Knowledge Graph
(similarity search) (entity relationships)
For each document, LightRAG:
- Chunks the text and embeds it (standard vector RAG)
- Uses an LLM to extract entities (people, tools, projects, concepts) and relationships (who decided what, what depends on what)
- Stores both in parallel — vectors for similarity, graph for structure
Ingest Documents via API
# Ingest a single file
curl -X POST http://localhost:9623/documents/upload \
-F "file=@/path/to/your/document.md"
# Ingest a text string directly
curl -X POST http://localhost:9623/documents/text \
-H "Content-Type: application/json" \
-d '{"text": "Your knowledge content here...", "description": "Source description"}'
# Ingest all files in a directory
for file in ~/.hermes/memories/*.md; do
curl -X POST http://localhost:9623/documents/upload -F "file=@$file"
echo "Ingested: $file"
done
What to Ingest
Feed LightRAG everything your agent needs to "know":
- Memory files —
~/.hermes/memories/*.md - Project docs — README files, design docs, decision logs
- Chat summaries — Exported conversation summaries
- Notes — Any markdown/text knowledge you want searchable
- Code comments — Extracted from important codebases
Start with your memory files and project docs. These give the graph the most value — decisions, people, projects, and their relationships.
Querying the Graph
Query Modes
LightRAG has four query modes:
| Mode | Best For | How It Works |
|---|---|---|
naive |
Simple keyword lookups | Vector search only (like basic RAG) |
local |
Specific entity facts | Entity-focused graph traversal |
global |
Cross-document relationships | Relationship-focused traversal |
hybrid |
General questions (default) | Both local + global combined |
Query via API
# Hybrid query (recommended default)
curl -X POST http://localhost:9623/query \
-H "Content-Type: application/json" \
-d '{
"query": "What infrastructure decisions were made and why?",
"mode": "hybrid",
"only_need_context": false
}'
# Local mode — specific entity facts
curl -X POST http://localhost:9623/query \
-H "Content-Type: application/json" \
-d '{
"query": "Tell me about the 5
---
*README truncated. [Continue reading on GitHub](https://github.com/OnlyTerp/hermes-optimization-guide#readme)*