NousResearch/hermes-agent-self-evolution official

Name: hermes-agent-self-evolution
Author: NousResearch

Evolutionary self-improvement using DSPy + GEPA — optimizes skills, prompts, and code

★ 4.6K maintainerNous Research

overview

Hermes Agent Self-Evolution is a framework for the automatic improvement of Hermes Agent's capabilities. It utilizes DSPy and Genetic-Pareto Prompt Evolution (GEPA) to optimize skills, prompts, and code through reflective evolutionary search. The system analyzes execution traces to propose targeted mutations without requiring GPU training. All evolved variants are subject to strict guardrails, including test suites and size limits, before undergoing human review via pull requests.

Optimizes skills and prompts using DSPy and GEPA
No GPU training required; operates via API calls
Strict guardrails including full test suites and size limits

full readme from github

🧬 Hermes Agent Self-Evolution

Evolutionary self-improvement for Hermes Agent.

Hermes Agent Self-Evolution uses DSPy + GEPA (Genetic-Pareto Prompt Evolution) to automatically evolve and optimize Hermes Agent's skills, tool descriptions, system prompts, and code — producing measurably better versions through reflective evolutionary search.

No GPU training required. Everything operates via API calls — mutating text, evaluating results, and selecting the best variants. ~$2-10 per optimization run.

How It Works

Read current skill/prompt/tool ──► Generate eval dataset
                                        │
                                        ▼
                                   GEPA Optimizer ◄── Execution traces
                                        │                    ▲
                                        ▼                    │
                                   Candidate variants ──► Evaluate
                                        │
                                   Constraint gates (tests, size limits, benchmarks)
                                        │
                                        ▼
                                   Best variant ──► PR against hermes-agent

GEPA reads execution traces to understand why things fail (not just that they failed), then proposes targeted improvements. ICLR 2026 Oral, MIT licensed.

Quick Start

# Install
git clone https://github.com/NousResearch/hermes-agent-self-evolution.git
cd hermes-agent-self-evolution
pip install -e ".[dev]"

# Point at your hermes-agent repo
export HERMES_AGENT_REPO=~/.hermes/hermes-agent

# Evolve a skill (synthetic eval data)
python -m evolution.skills.evolve_skill \
    --skill github-code-review \
    --iterations 10 \
    --eval-source synthetic

# Or use real session history from Claude Code, Copilot, and Hermes
python -m evolution.skills.evolve_skill \
    --skill github-code-review \
    --iterations 10 \
    --eval-source sessiondb

What It Optimizes

Phase	Target	Engine	Status
Phase 1	Skill files (SKILL.md)	DSPy + GEPA	✅ Implemented
Phase 2	Tool descriptions	DSPy + GEPA	🔲 Planned
Phase 3	System prompt sections	DSPy + GEPA	🔲 Planned
Phase 4	Tool implementation code	Darwinian Evolver	🔲 Planned
Phase 5	Continuous improvement loop	Automated pipeline	🔲 Planned

Engines

Engine	What It Does	License
DSPy + GEPA	Reflective prompt evolution — reads execution traces, proposes targeted mutations	MIT
Darwinian Evolver	Code evolution with Git-based organisms	AGPL v3 (external CLI only)

Guardrails

Every evolved variant must pass:

Full test suite — pytest tests/ -q must pass 100%
Size limits — Skills ≤15KB, tool descriptions ≤500 chars
Caching compatibility — No mid-conversation changes
Semantic preservation — Must not drift from original purpose
PR review — All changes go through human review, never direct commit

Full Plan

See PLAN.md for the complete architecture, evaluation data strategy, constraints, benchmarks integration, and phased timeline.