honcho-self-hosted
Self-host Honcho memory layer for Hermes Agent — OpenRouter + Venice, no code changes
Self-Hosted Honcho for Hermes Agent
Self-host Honcho (Plastic Labs' memory layer) on your own server instead of using their cloud. Works with Hermes Agent out of the box.
No fork required — just 3 config files on top of upstream Honcho.
Background: Hermes L4 Memory
Hermes Agent has a 4-layer memory system. The cross-session memory layer is powered by Honcho, which builds a deepening model of the user across conversations — extracting observations, recalling context, and consolidating memories over time.
By default, Hermes uses Plastic Labs' managed cloud (honcho.dev) + their Neuromancer models. This works out of the box but means your conversation data and user profile live on their servers.
What are Neuromancer models?
Neuromancer XR is a specialized 8B model fine-tuned from Qwen3-8B specifically for extracting logical conclusions from conversations. Unlike general-purpose LLMs which are optimized for plausible text generation, Neuromancer is trained on ~10,000 curated social reasoning traces to follow formal logic — extracting both explicit facts ("user said they like Python") and deductive conclusions ("user is likely a developer").
It scores 86.9% on the LoCoMo memory benchmark vs. 69.6% for base Qwen3-8B and 80.0% for Claude 4 Sonnet.
Tradeoff of not using it: General-purpose models work well for observation extraction and memory recall — Honcho's prompts and tool-calling pipeline compensate for much of the gap. You may get slightly less precise deductive reasoning, but capable models (GLM-5, Grok 4.1) with strong function calling largely close the difference. The main advantage of self-hosting is data sovereignty, not matching Neuromancer's exact reasoning quality.
Deployment Options
| Option | Privacy | Data location | LLM for memory | Setup | Cost |
|---|---|---|---|---|---|
| Managed cloud (default) | Low — data + inference on 3rd party | Plastic Labs servers | Neuromancer (Plastic Labs) | None — built into Hermes | Free tier / paid |
| Self-hosted + API (this repo) | Medium — data on your machine, inference via API | Your machine | Any OpenAI-compatible API | ~3 minutes | API usage only |
| Self-hosted + local model | High — nothing leaves your network | Your machine | Local LLM (Ollama, vLLM) | More setup | Hardware only |
Managed cloud — Zero setup. Best for getting started. Your data is on Plastic Labs' infrastructure.
Self-hosted + API — This repo. Your data stays on your machine. LLM calls go to a cloud API for inference only — the provider sees request content but doesn't store your memory data. Best balance of privacy and capability.
Self-hosted + local model — Maximum privacy. No data leaves your network. Requires a GPU or capable CPU on your LAN running an inference server (Ollama, vLLM, llama.cpp). Set LLM_VLLM_BASE_URL to your local server. Trade-off: smaller models may produce lower quality observations and reasoning than cloud APIs.
What this does
- Runs Honcho's full memory stack (API, Deriver, PostgreSQL, Redis) on your machine
- Routes LLM calls through any OpenAI-compatible provider (primary + backup)
- All your data stays on your machine — no third-party cloud storage
- Works with OpenRouter, Venice, Routstr, Together, Ollama, or any other provider
Architecture
Hermes Agent ──► localhost:8000 (self-hosted Honcho API)
│
├── PostgreSQL + pgvector (your machine)
├── Redis cache (your machine)
│
└── Deriver/Dialectic/Dream workers
│
├── Primary LLM provider (any OpenAI-compatible API)
└── Backup LLM provider (optional)
Prerequisites
- Ubuntu 22.04+ (VM, VPS, bare metal, or any Linux server — tested on 22.04, 6GB RAM, 80GB disk)
- Docker Engine + Compose plugin
- API key from any OpenAI-compatible provider (openrouter.ai, venice.ai, together.ai, etc.)
- Second API key for backup (optional)
Quick Start
curl -sL https://raw.githubusercontent.com/elkimek/honcho-self-hosted/main/setup.sh -o /tmp/setup.sh
bash /tmp/setup.sh
This installs Docker (if needed), clones Honcho, copies configs, prompts for API keys, starts everything, configures Hermes, and optionally sets up the MCP server. ~3 minutes.
Manual Setup
1. Install Docker
sudo apt-get update && sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
| sudo tee /etc/apt/sources.list.d/docker.list
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
Log out and back in for the group change to take effect.
2. Clone repos + copy configs
# Clone this config repo
git clone https://github.com/elkimek/honcho-self-hosted.git ~/honcho-self-hosted
# Clone upstream Honcho
git clone --depth 1 https://github.com/plastic-labs/honcho.git ~/honcho
# Copy config files into the Honcho clone
cp ~/honcho-self-hosted/docker-compose.yml ~/honcho/
cp ~/honcho-self-hosted/config.toml ~/honcho/
cp ~/honcho-self-hosted/env.example ~/honcho/.env
3. Set your API keys
Edit ~/honcho/.env:
nano ~/honcho/.env
Replace the placeholder values with your actual API keys:
LLM_VLLM_API_KEY— primary LLM providerLLM_VLLM_BASE_URL— primary provider's API URLLLM_EMBEDDING_API_KEY— embedding provider (can be same as primary)LLM_EMBEDDING_BASE_URL— embedding provider's API URLLLM_EMBEDDING_MODEL— embedding model name (default:openai/text-embedding-3-small)LLM_OPENAI_COMPATIBLE_API_KEY— backup LLM provider (optional)LLM_OPENAI_API_KEY— same as your primary key (needed for client init)
Any OpenAI-compatible provider works (OpenRouter, Venice, Routstr, Together, etc.) — just set the key and URL. See Using different providers for details.
Embedding fallback: if LLM_EMBEDDING_API_KEY or LLM_EMBEDDING_BASE_URL is left empty, Honcho falls back to the backup provider credentials (LLM_OPENAI_COMPATIBLE_*). This is useful if your backup provider (e.g. Venice) supports embeddings at negligible cost.
If you don't want a backup provider: remove all BACKUP_PROVIDER and BACKUP_MODEL lines from config.toml, and set LLM_OPENAI_COMPATIBLE_API_KEY + OPENAI_COMPATIBLE_BASE_URL to the same values as your primary. The setup script handles this automatically.
4. Start Honcho
cd ~/honcho
docker compose up -d
First run builds images and runs DB migrations (~2 minutes). Check status:
docker compose ps
docker compose logs -f api deriver
Wait ~10 seconds for the API to start, then verify:
curl -s http://localhost:8000/openapi.json | head -1
5. Configure Hermes
mkdir -p ~/.honcho
cp ~/honcho-self-hosted/honcho-config.json ~/.honcho/config.json
hermes gateway restart
Hermes will now use your local Honcho instead of api.honcho.dev.
Model Configuration
Honcho has 4 background components that use LLM calls:
- Deriver — Reads every message and extracts observations about the user ("prefers Python", "privacy-focused"). Memory formation.
- Dialectic — Answers questions about the user on demand, with 5 reasoning levels (minimal → max). Memory recall.
- Summary — Compresses long sessions into short/long summaries to keep context manageable.
- Dream — Runs every ~8 hours. Merges redundant observations, deletes outdated ones, infers higher-level patterns. Memory consolidation.
LLM calls are tiered by task complexity. Defaults are chosen for function-calling reliability (the primary requirement for Honcho's tool-using agents):
| Component | Default model | Tier | When it runs |
|---|---|---|---|
| Deriver | z-ai/glm-4.7-flash |
Light — fast, cheap, 79.5% tau-bench | Every message |
| Summary | z-ai/glm-4.7-flash |
Light | Every 20/60 messages |
| Dialectic (low) | z-ai/glm-4.7-flash |
Light | Per Hermes turn |
| Dialectic (med/high) | x-ai/grok-4.1-fast |
Medium — built for tool use, 2M context | Complex queries |
| Dialectic (max) | z-ai/glm-5 |
Heavy — 89.7% tau2-bench | Hardest queries |
| Dream | z-ai/glm-5 |
Heavy | Every ~8 hours |
These are OpenRouter model IDs. Any model your provider supports will work — just change the name in config.toml. Each component also has a backup provider that fires automatically if the primary fails on the last retry.
To change models, edit ~/honcho/config.toml and rebuild:
cd ~/honcho
docker compose up -d --build
Using different providers
Honcho supports these provider slots natively:
| Slot | Config key | How to use |
|---|---|---|
custom |
OPENAI_COMPATIBLE_BASE_URL + OPENAI_COMPATIBLE_API_KEY |
Any OpenAI-compatible API |
vllm |
VLLM_BASE_URL + VLLM_API_KEY |
Any OpenAI-compatible API |
openai |
OPENAI_API_KEY |
OpenAI direct |
anthropic |
ANTHROPIC_API_KEY |
Anthropic direct |
google |
GEMINI_API_KEY |
Google Gemini |
groq |
GROQ_API_KEY |
Groq |
You can mix providers per component in config.toml:
[deriver]
PROVIDER = "groq" # fast for frequent tasks
MODEL = "llama-3.3-70b"
[dream]
PROVIDER = "anthropic" # best reasoning for rare tasks
MODEL = "claude-sonnet-4-6"
Local / LAN Inference
For maximum privacy, run a local model instead of a cloud API. Any server with an OpenAI-compatible endpoint works.
Local servers have different model catalogs than cloud APIs, so model names will differ. The setup script asks you for the model name your server provides.
Recommended local models for reliable function calling:
| Model | Params | Ollama name | Notes |
|---|---|---|---|
| GLM-4.7 Flash | 30B MoE | glm-4.7-flash |
Same family as cloud default, best tool use in 30B class |
| Llama 3.3 | 70B | llama3.3:70b |
Battle-tested tools, needs ~40GB VRAM |
Ollama (easiest)
# Install Ollama on this machine or another on your LAN
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model with strong function calling
ollama pull glm-4.7-flash
Then run the setup script and choose option 2 (Local / LAN):
Server URL: http://localhost:11434/v1
Model name for light tasks: glm-4.7-flash
Model name for heavy tasks: glm-4.7-flash
For a separate LAN machine, use its IP: http://192.168.x.x:11434/v1
vLLM
# Serve a model with tool calling support
vllm serve THUDM/GLM-4.7-Flash --port 8001 --enable-auto-tool-choice
Setup: http://localhost:8001/v1 with model name THUDM/GLM-4.7-Flash
Considerations
- Model size matters — Honcho's agents need reliable function calling and structured JSON. Models under 14B may miss tool calls or malform output. 32B+ recommended.
- Embeddings need a cloud API — local servers typically can't serve embedding models. The setup script asks for a separate cloud API key, URL, and model name for embeddings (e.g. OpenRouter with
openai/text-embedding-3-small, or Venice withtext-embedding-bge-m3), or lets you disable embeddings entirely (Honcho works but without vector search). - Same model for all tiers — locally you'll typically run one model. The script sets it for all components. You can differentiate later in
config.tomlif you serve multiple models. - No backup provider — local mode uses a single server. If it goes down, Honcho's deriver queues work until it's back.
MCP Server (optional)
Honcho includes an MCP server that exposes memory tools (search, chat, observations, peer cards) to any MCP-compatible client like Claude Code or Claude Desktop.
The hosted version at mcp.honcho.dev points at Plastic Labs' cloud. For self-hosted, run the MCP server locally and point it at your Honcho instance.
The setup script offers to configure MCP automatically. To set it up manually:
Setup
Requires Node.js 22+ and Bun on the server:
cd ~/honcho/mcp
# Patch to use local Honcho instead of honcho.dev
sed -i 's|https://api.honcho.dev|http://localhost:8000|' src/config.ts
bun install
Run as a service
Create /etc/systemd/system/honcho-mcp.service:
[Unit]
Description=Honcho MCP Server
After=network.target docker.service
[Service]
Type=simple
User=your-username
WorkingDirectory=/home/your-username/honcho/mcp
ExecStart=/usr/bin/npx wrangler dev --port 8787 --ip 0.0.0.0
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Then:
sudo systemctl daemon-reload
sudo systemctl enable --now honcho-mcp
Connect Claude Code
If the MCP server is on a remote machine, tunnel the port over SSH:
ssh -f -N -L 8787:localhost:8787 user@your-server
Then add to Claude Code:
claude mcp add --transport http honcho http://localhost:8787 \
--header "Authorization: Bearer local" \
--header "X-Honcho-User-Name: your-name" \
--header "X-Honcho-Workspace-ID: hermes"
Connect Claude Desktop
Add to your Claude Desktop config (claude_desktop_config.json):
{
"mcpServers": {
"honcho": {
"command": "npx",
"args": [
"mcp-remote",
"http://localhost:8787",
"--header", "Authorization:${AUTH_HEADER}",
"--header", "X-Honcho-User-Name:${USER_NAME}"
],
"env": {
"AUTH_HEADER": "Bearer local",
"USER_NAME": "your-name"
}
}
}
}
Maintenance
Update Honcho:
cd ~/honcho
docker compose down
git checkout mcp/src/config.ts # restore upstream MCP file if patched
git pull
docker compose up -d --build
If you use the MCP server, re-apply the patch after pulling:
sed -i 's|https://api.honcho.dev|http://localhost:8000|' mcp/src/config.ts
sudo systemctl restart honcho-mcp
View logs:
docker compose logs -f api deriver
Check queue status:
curl -s http://localhost:8000/v3/workspaces/hermes/queue/status | python3 -m json.tool
Backup data:
docker compose exec database pg_dump -U honcho honcho > backup.sql
Known Limitations
- Embedding fallback shares backup config — if
LLM_EMBEDDING_API_KEY/LLM_EMBEDDING_BASE_URLare empty, Honcho falls back toLLM_OPENAI_COMPATIBLE_*(backup provider). This is intentional and works well when your backup supports embeddings (e.g. Venice withtext-embedding-bge-m3). Set the embedding env vars explicitly if you want embeddings routed separately. - One backup per component — Honcho supports primary + one backup provider, not a full failover chain. Using a multi-provider router (e.g. OpenRouter) as primary mitigates this.
- No E2EE — Honcho's agents use function calling, which isn't compatible with end-to-end encryption. LLM request content is visible to the provider, but your stored data (sessions, observations, embeddings) stays on your machine.
Files
| File | Purpose |
|---|---|
docker-compose.yml |
Docker deployment — API, Deriver, PostgreSQL, Redis |
config.toml |
Honcho config — providers, models, feature flags |
env.example |
API keys template — copy to ~/honcho/.env and fill in |
honcho-config.json |
Hermes-side config — tells Hermes to use localhost:8000 |
setup.sh |
One-command installer — handles everything |
License
GPL-3.0
Credits
- Honcho by Plastic Labs
- Hermes Agent by Nous Research