honcho-self-hosted

Self-host Honcho memory layer for Hermes Agent — OpenRouter + Venice, no code changes

★ 147 Shell GPL-3.0 Updated 4/9/2026
View on GitHub →

Self-Hosted Honcho for Hermes Agent

Self-host Honcho (Plastic Labs' memory layer) on your own server instead of using their cloud. Works with Hermes Agent out of the box.

No fork required — just 3 config files on top of upstream Honcho.

Background: Hermes L4 Memory

Hermes Agent has a 4-layer memory system. The cross-session memory layer is powered by Honcho, which builds a deepening model of the user across conversations — extracting observations, recalling context, and consolidating memories over time.

By default, Hermes uses Plastic Labs' managed cloud (honcho.dev) + their Neuromancer models. This works out of the box but means your conversation data and user profile live on their servers.

What are Neuromancer models?

Neuromancer XR is a specialized 8B model fine-tuned from Qwen3-8B specifically for extracting logical conclusions from conversations. Unlike general-purpose LLMs which are optimized for plausible text generation, Neuromancer is trained on ~10,000 curated social reasoning traces to follow formal logic — extracting both explicit facts ("user said they like Python") and deductive conclusions ("user is likely a developer").

It scores 86.9% on the LoCoMo memory benchmark vs. 69.6% for base Qwen3-8B and 80.0% for Claude 4 Sonnet.

Tradeoff of not using it: General-purpose models work well for observation extraction and memory recall — Honcho's prompts and tool-calling pipeline compensate for much of the gap. You may get slightly less precise deductive reasoning, but capable models (GLM-5, Grok 4.1) with strong function calling largely close the difference. The main advantage of self-hosting is data sovereignty, not matching Neuromancer's exact reasoning quality.

Deployment Options

Option Privacy Data location LLM for memory Setup Cost
Managed cloud (default) Low — data + inference on 3rd party Plastic Labs servers Neuromancer (Plastic Labs) None — built into Hermes Free tier / paid
Self-hosted + API (this repo) Medium — data on your machine, inference via API Your machine Any OpenAI-compatible API ~3 minutes API usage only
Self-hosted + local model High — nothing leaves your network Your machine Local LLM (Ollama, vLLM) More setup Hardware only

Managed cloud — Zero setup. Best for getting started. Your data is on Plastic Labs' infrastructure.

Self-hosted + API — This repo. Your data stays on your machine. LLM calls go to a cloud API for inference only — the provider sees request content but doesn't store your memory data. Best balance of privacy and capability.

Self-hosted + local model — Maximum privacy. No data leaves your network. Requires a GPU or capable CPU on your LAN running an inference server (Ollama, vLLM, llama.cpp). Set LLM_VLLM_BASE_URL to your local server. Trade-off: smaller models may produce lower quality observations and reasoning than cloud APIs.

What this does

Architecture

Hermes Agent ──► localhost:8000 (self-hosted Honcho API)
                      │
                      ├── PostgreSQL + pgvector (your machine)
                      ├── Redis cache (your machine)
                      │
                      └── Deriver/Dialectic/Dream workers
                              │
                              ├── Primary LLM provider (any OpenAI-compatible API)
                              └── Backup LLM provider (optional)

Prerequisites

Quick Start

curl -sL https://raw.githubusercontent.com/elkimek/honcho-self-hosted/main/setup.sh -o /tmp/setup.sh
bash /tmp/setup.sh

This installs Docker (if needed), clones Honcho, copies configs, prompts for API keys, starts everything, configures Hermes, and optionally sets up the MCP server. ~3 minutes.

Manual Setup

1. Install Docker

sudo apt-get update && sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
  https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
  | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER

Log out and back in for the group change to take effect.

2. Clone repos + copy configs

# Clone this config repo
git clone https://github.com/elkimek/honcho-self-hosted.git ~/honcho-self-hosted

# Clone upstream Honcho
git clone --depth 1 https://github.com/plastic-labs/honcho.git ~/honcho

# Copy config files into the Honcho clone
cp ~/honcho-self-hosted/docker-compose.yml ~/honcho/
cp ~/honcho-self-hosted/config.toml ~/honcho/
cp ~/honcho-self-hosted/env.example ~/honcho/.env

3. Set your API keys

Edit ~/honcho/.env:

nano ~/honcho/.env

Replace the placeholder values with your actual API keys:

Any OpenAI-compatible provider works (OpenRouter, Venice, Routstr, Together, etc.) — just set the key and URL. See Using different providers for details.

Embedding fallback: if LLM_EMBEDDING_API_KEY or LLM_EMBEDDING_BASE_URL is left empty, Honcho falls back to the backup provider credentials (LLM_OPENAI_COMPATIBLE_*). This is useful if your backup provider (e.g. Venice) supports embeddings at negligible cost.

If you don't want a backup provider: remove all BACKUP_PROVIDER and BACKUP_MODEL lines from config.toml, and set LLM_OPENAI_COMPATIBLE_API_KEY + OPENAI_COMPATIBLE_BASE_URL to the same values as your primary. The setup script handles this automatically.

4. Start Honcho

cd ~/honcho
docker compose up -d

First run builds images and runs DB migrations (~2 minutes). Check status:

docker compose ps
docker compose logs -f api deriver

Wait ~10 seconds for the API to start, then verify:

curl -s http://localhost:8000/openapi.json | head -1

5. Configure Hermes

mkdir -p ~/.honcho
cp ~/honcho-self-hosted/honcho-config.json ~/.honcho/config.json
hermes gateway restart

Hermes will now use your local Honcho instead of api.honcho.dev.

Model Configuration

Honcho has 4 background components that use LLM calls:

LLM calls are tiered by task complexity. Defaults are chosen for function-calling reliability (the primary requirement for Honcho's tool-using agents):

Component Default model Tier When it runs
Deriver z-ai/glm-4.7-flash Light — fast, cheap, 79.5% tau-bench Every message
Summary z-ai/glm-4.7-flash Light Every 20/60 messages
Dialectic (low) z-ai/glm-4.7-flash Light Per Hermes turn
Dialectic (med/high) x-ai/grok-4.1-fast Medium — built for tool use, 2M context Complex queries
Dialectic (max) z-ai/glm-5 Heavy — 89.7% tau2-bench Hardest queries
Dream z-ai/glm-5 Heavy Every ~8 hours

These are OpenRouter model IDs. Any model your provider supports will work — just change the name in config.toml. Each component also has a backup provider that fires automatically if the primary fails on the last retry.

To change models, edit ~/honcho/config.toml and rebuild:

cd ~/honcho
docker compose up -d --build

Using different providers

Honcho supports these provider slots natively:

Slot Config key How to use
custom OPENAI_COMPATIBLE_BASE_URL + OPENAI_COMPATIBLE_API_KEY Any OpenAI-compatible API
vllm VLLM_BASE_URL + VLLM_API_KEY Any OpenAI-compatible API
openai OPENAI_API_KEY OpenAI direct
anthropic ANTHROPIC_API_KEY Anthropic direct
google GEMINI_API_KEY Google Gemini
groq GROQ_API_KEY Groq

You can mix providers per component in config.toml:

[deriver]
PROVIDER = "groq"          # fast for frequent tasks
MODEL = "llama-3.3-70b"

[dream]
PROVIDER = "anthropic"     # best reasoning for rare tasks
MODEL = "claude-sonnet-4-6"

Local / LAN Inference

For maximum privacy, run a local model instead of a cloud API. Any server with an OpenAI-compatible endpoint works.

Local servers have different model catalogs than cloud APIs, so model names will differ. The setup script asks you for the model name your server provides.

Recommended local models for reliable function calling:

Model Params Ollama name Notes
GLM-4.7 Flash 30B MoE glm-4.7-flash Same family as cloud default, best tool use in 30B class
Llama 3.3 70B llama3.3:70b Battle-tested tools, needs ~40GB VRAM

Ollama (easiest)

# Install Ollama on this machine or another on your LAN
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model with strong function calling
ollama pull glm-4.7-flash

Then run the setup script and choose option 2 (Local / LAN):

Server URL: http://localhost:11434/v1
Model name for light tasks: glm-4.7-flash
Model name for heavy tasks: glm-4.7-flash

For a separate LAN machine, use its IP: http://192.168.x.x:11434/v1

vLLM

# Serve a model with tool calling support
vllm serve THUDM/GLM-4.7-Flash --port 8001 --enable-auto-tool-choice

Setup: http://localhost:8001/v1 with model name THUDM/GLM-4.7-Flash

Considerations

MCP Server (optional)

Honcho includes an MCP server that exposes memory tools (search, chat, observations, peer cards) to any MCP-compatible client like Claude Code or Claude Desktop.

The hosted version at mcp.honcho.dev points at Plastic Labs' cloud. For self-hosted, run the MCP server locally and point it at your Honcho instance.

The setup script offers to configure MCP automatically. To set it up manually:

Setup

Requires Node.js 22+ and Bun on the server:

cd ~/honcho/mcp

# Patch to use local Honcho instead of honcho.dev
sed -i 's|https://api.honcho.dev|http://localhost:8000|' src/config.ts

bun install

Run as a service

Create /etc/systemd/system/honcho-mcp.service:

[Unit]
Description=Honcho MCP Server
After=network.target docker.service

[Service]
Type=simple
User=your-username
WorkingDirectory=/home/your-username/honcho/mcp
ExecStart=/usr/bin/npx wrangler dev --port 8787 --ip 0.0.0.0
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Then:

sudo systemctl daemon-reload
sudo systemctl enable --now honcho-mcp

Connect Claude Code

If the MCP server is on a remote machine, tunnel the port over SSH:

ssh -f -N -L 8787:localhost:8787 user@your-server

Then add to Claude Code:

claude mcp add --transport http honcho http://localhost:8787 \
  --header "Authorization: Bearer local" \
  --header "X-Honcho-User-Name: your-name" \
  --header "X-Honcho-Workspace-ID: hermes"

Connect Claude Desktop

Add to your Claude Desktop config (claude_desktop_config.json):

{
  "mcpServers": {
    "honcho": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "http://localhost:8787",
        "--header", "Authorization:${AUTH_HEADER}",
        "--header", "X-Honcho-User-Name:${USER_NAME}"
      ],
      "env": {
        "AUTH_HEADER": "Bearer local",
        "USER_NAME": "your-name"
      }
    }
  }
}

Maintenance

Update Honcho:

cd ~/honcho
docker compose down
git checkout mcp/src/config.ts  # restore upstream MCP file if patched
git pull
docker compose up -d --build

If you use the MCP server, re-apply the patch after pulling:

sed -i 's|https://api.honcho.dev|http://localhost:8000|' mcp/src/config.ts
sudo systemctl restart honcho-mcp

View logs:

docker compose logs -f api deriver

Check queue status:

curl -s http://localhost:8000/v3/workspaces/hermes/queue/status | python3 -m json.tool

Backup data:

docker compose exec database pg_dump -U honcho honcho > backup.sql

Known Limitations

Files

File Purpose
docker-compose.yml Docker deployment — API, Deriver, PostgreSQL, Redis
config.toml Honcho config — providers, models, feature flags
env.example API keys template — copy to ~/honcho/.env and fill in
honcho-config.json Hermes-side config — tells Hermes to use localhost:8000
setup.sh One-command installer — handles everything

License

GPL-3.0

Credits