hermes-alpha
Cloud deployed version of the Nous Research Hermes agent
HERMES ALPHA
Can a stock AI agent β given nothing but a mission brief β bootstrap an autonomous bug bounty system from scratch?
This repo is the experiment.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Creator (You) β
β ββ browser terminal / Telegram β
β ββ Overseer (persistent, strategic β builds the system)β
β ββ Hunter (ephemeral, tactical β finds the bugs) β
β ββ subagents (parallel analysis workers) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The Experiment
Most AI agent projects give the agent a mountain of custom tools, structured APIs, and carefully engineered infrastructure. Hermes Alpha asks: what if you gave it nothing?
We take a stock Hermes agent from Nous Research, hand it a single identity document (soul.md), and challenge it to build, deploy, and continuously improve a second AI agent that finds real software vulnerabilities for bug bounty payouts.
No custom tools. No purpose-built infrastructure. Just a Linux terminal, git, and a mission.
The Thesis
A self-improving two-agent loop β where the Overseer evolves the Hunter's code, skills, and strategy based on real outcomes β compounds over time. The Hunter gets measurably better at finding vulnerabilities. The Overseer gets measurably better at improving the Hunter. And the whole system is validated by an objective, economic signal: bounties paid or not paid.
This is Path B of a deliberate A/B test:
| Hermes Prime (Path A) | Hermes Alpha (Path B) | |
|---|---|---|
| Approach | Purpose-built infrastructure, custom tools, structured APIs | Stock agent + identity document, zero custom code |
| Question | Does pre-built scaffolding accelerate results? | Can an agent bootstrap everything from first principles? |
| Status | Parallel development | This repo |
The winner informs the long-term architecture.
How It Works
The Overseer
The Overseer is a persistent Hermes agent that lives in a web terminal. It doesn't hunt for vulnerabilities itself. Instead, it:
- Builds the Hunter's codebase β security skills, system prompt, tools, Dockerfile
- Deploys the Hunter to its own Fly.io machine
- Monitors the Hunter's performance via logs and Elephantasm memory streams
- Intervenes when it spots problems β soft (runtime guidance injection) or hard (code changes + redeploy)
- Learns which interventions work over time, compounding improvements
Three intervention modes, always preferring the least invasive:
SOFT βββ inject a runtime instruction (immediate, low risk)
HARD βββ modify Hunter source, commit, push, redeploy (systemic, medium risk)
MODEL βββ switch the Hunter's LLM tier (cost/quality optimisation)
The Hunter
A Hermes agent armed with security analysis skills that follows a four-phase workflow per target:
RECON ββββββ ANALYSIS ββββββ VERIFICATION ββββββ REPORTING
clone repo static + build PoC, structured report
map surface dynamic test confirm exploit with CVSS, CWE,
check deps code review rule out FPs repro steps
Target market: mid-tier bounties ($500β$5,000) β auth bypasses, IDOR, privilege escalation, info disclosure. Systematic analysis beats genius-level creativity here, and that's what agents are good at.
The Self-Improvement Loop
This is the core insight. A static agent plateaus. A self-improving one compounds:
Hunter v1 analyses target βββ finds 2 vulns, misses 5
β
Overseer reviews logs βββ identifies gaps in skills
β
Overseer rewrites skills, redeploys βββ Hunter v2
β
Hunter v2 analyses next target βββ finds 4 vulns, misses 3
β
ββββ repeat. compound. improve.
Four nested feedback loops operate at different timescales:
| Loop | Timescale | What Happens |
|---|---|---|
| Tactical | secondsβminutes | Overseer injects runtime guidance based on live events |
| Structural | minutesβhours | Overseer writes new skills/tools, redeploys Hunter |
| Strategic | hoursβdays | Elephantasm memory reveals which strategies actually work |
| Meta-strategic | daysβweeks | Creator reviews outcomes, redirects the whole system |
The Web Terminal
The gateway is a FastAPI app that bridges a PTY (pseudo-terminal) to WebSocket via xterm.js. When you connect:
- Authenticate with your password
- A WebSocket spawns
hermes chatinside a PTY - Bidirectional I/O streams between your browser and the agent's terminal
- The agent has full access to its Linux environment β git, Python, Node, flyctl, curl, everything
The terminal is styled with a cyberpunk aesthetic β dark background, red accents, scanlines, IBM Plex Mono β because if you're going to watch an AI bootstrap a security operation, it should look the part.
Session resilience ensures the terminal survives laptop sleep, network blips, and PTY crashes with auto-reconnect and heartbeat detection. A messaging gateway sidecar (Telegram/Discord/Slack/Signal) keeps the agent reachable even when the browser isn't open.
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Browser β
β βββββββββββββββββββ ββββββββββββββββββββββββββ β
β β xterm.js βββββΊβ WebSocket β β
β β (terminal UI) β β (bidirectional I/O) β β
β βββββββββββββββββββ βββββββββββββ¬βββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββ
β Fly.io Machine β β
β βββββββββββββββββββββββββββββββββββββΌββββββββββββββββ β
β β FastAPI Gateway (gateway/app.py) β β
β β ββ Auth (session cookie) β β
β β ββ PTY manager (spawn, monitor, bridge) β β
β β ββ Provider switcher (OpenRouter / Nous Direct) β β
β βββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββββββββββΌββββββββββββββββ β
β β hermes chat (PTY process) β β
β β ββ Identity: soul.md (Overseer persona) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Messaging Gateway Sidecar (independent process) β β
β β ββ Telegram / Discord / Slack / Signal β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β /root/.hermes (persistent Fly.io volume) β β
β β ββ sessions/ memories/ skills/ logs/ β β
β β ββ .env config.yaml SOUL.md β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββ΄βββββββββββ
β Elephantasm API β
β (long-term memory) β
βββββββββββββββββββββββ
Key files:
| File | Purpose |
|---|---|
gateway/app.py |
FastAPI server β auth, PTY lifecycle, WebSocket bridge |
gateway/soul.md |
Overseer identity β mission, hierarchy, guardrails |
gateway/entrypoint.sh |
Bootstrap persistent volume, start sidecar + server |
gateway/Dockerfile |
Install Hermes agent + web gateway |
gateway/static/ |
Login + terminal HTML (xterm.js, provider selector) |
docs/vision.md |
Full architectural vision and design rationale |
Quick Start
Local (Docker Compose)
cp .env.example .env
# Edit .env β at minimum set OPENROUTER_API_KEY and TTYD_PASS
make up # Runs on http://localhost:8081
make down # Stop
Production (Fly.io)
# Requires flyctl installed and authenticated
make deploy # Deploy to Fly.io
make logs # Tail live logs
make ssh # SSH into the machine
make status # Check app status
Environment Variables
# Required
OPENROUTER_API_KEY=sk-or-... # LLM provider
# Recommended
TTYD_PASS=... # Terminal password
ELEPHANTASM_API_KEY=sk_live_... # Long-term memory
ELEPHANTASM_ANIMA_ID=anima_...
# Optional
HERMES_API_KEY=sk-... # Direct Nous Research inference
FIRECRAWL_API_KEY=fc-... # Web search/scrape
FAL_KEY=... # Image generation
# Messaging (pick one or more to keep the agent reachable)
TELEGRAM_BOT_TOKEN=...
DISCORD_BOT_TOKEN=...
SLACK_BOT_TOKEN=...
See .env.example for the full list.
Safety & Ethics
This project operates under strict guardrails:
- No attacking live systems. Source code analysis and sandboxed PoC only. Never probe, scan, or exploit production infrastructure.
- Scope enforcement. Every target is verified in-scope for its bounty program before analysis begins.
- Human approval required. No vulnerability report is submitted to any platform without explicit Creator approval.
- No credential harvesting. Credentials found in targets are never extracted, stored, or transmitted.
- Budget hard stops. When the budget limit is reached, the system stops. No exceptions.
- Full audit trail. Every significant action is recorded to Elephantasm.
Economics
At ~$15/day LLM budget + Fly.io compute, the system needs roughly one $500β$1,000 bounty per month to break even. The Overseer optimises for the only metric that matters: high-quality vulnerability reports that earn payouts. Everything else β speed, coverage, model selection β is a supporting signal in service of that goal.
Why This Matters
This isn't just a bug bounty bot. It's a test of a broader hypothesis about AI agents:
Can a general-purpose agent, given only a clear mission and a terminal, bootstrap complex multi-agent infrastructure that improves itself over time?
If the answer is yes, the implications extend far beyond security. If the answer is no, we learn exactly where and why agents need pre-built scaffolding β and that's equally valuable.
The experiment has an objective success criterion (bounties paid) and a built-in control group (Hermes Prime). Whatever happens, we learn something real.
Built with Hermes by Nous Research Β· Memory by Elephantasm Β· Deployed on Fly.io
A Crimson Sun experiment