hermes-skill-distillation
Hermes Agent Hackathon: RealWorldTaskEnv — generate agentic training trajectories from real-world tasks for Hermes 4 fine-tuning
Hermes Agent Hackathon — Skill Distillation Pipeline
Demo concept: Use Hermes agent's real-world tool usage to generate high-quality agentic training trajectories for Hermes 4 fine-tuning.
The Idea
Hermes agent already runs real tasks for real users. Every session is a potential training example. This project turns that latent signal into a closed learning loop:
Hermes agent runs tasks → trajectories captured → judge scores them → Atropos fine-tunes Hermes 4 → better model → better agent
The key insight: real-world grounded trajectories beat synthetic benchmarks. A model that learned to use tools by actually using them in production is fundamentally different from one trained on curated academic tasks.
What We Built
A RealWorldTaskEnv Hermes environment that:
- Runs a diverse task battery — 30 tasks across coding, web research, file ops, data analysis, sysadmin. Things users actually ask agents to do.
- Scores trajectories automatically — multi-dimensional reward: task completion (via ToolContext verification), efficiency, error recovery.
- Exports SFT-ready JSONL — drop-in for Atropos
processmode. - Connects to Atropos for live RL —
servemode wires directly into GRPO training. - Before/after comparison —
demo/compare_models.pyshows Hermes 4-14B vanilla vs. fine-tuned on 500 trajectories.
Quickstart
# Install
pip install git+https://github.com/NousResearch/hermes-agent.git
pip install git+https://github.com/NousResearch/atropos.git
# Generate SFT data (no training server needed)
python environments/real_world_task_env/real_world_task_env.py process \
--config environments/real_world_task_env/default.yaml \
--env.data_path_to_save_groups trajectories.jsonl \
--openai.model_name NousResearch/Hermes-4-14B
# Run benchmark (evaluate before/after)
python environments/real_world_task_env/real_world_task_env.py evaluate \
--config environments/real_world_task_env/default.yaml \
--openai.model_name NousResearch/Hermes-4-14B
# Live RL training (connect to Atropos)
run-api & # start Atropos API server
python environments/real_world_task_env/real_world_task_env.py serve \
--config environments/real_world_task_env/default.yaml \
--openai.model_name NousResearch/Hermes-4-14B
Demo Script
# Full before/after comparison demo
bash demo/run_demo.sh
This runs 10 held-out tasks on both the baseline and fine-tuned model, prints a side-by-side score table, and saves trajectories for inspection.
Architecture
real_world_task_env.py — Environment class (HermesAgentBaseEnv subclass)
tasks.json — 30 diverse real-world tasks with verification specs
judge_prompt.py — LLM judge for open-ended task scoring
default.yaml — Default config (Modal backend, tool subset)
demo/
run_demo.sh — One-command demo
compare_models.py — Side-by-side model comparison
Reward Function
Three-component reward (all 0.0–1.0, weighted):
| Component | Weight | How |
|---|---|---|
| Completion | 0.6 | ToolContext verification (file exists, tests pass, output correct) |
| Efficiency | 0.2 | 1.0 - (turns_used / max_turns) — fewer steps = higher score |
| Recovery | 0.2 | Judge model assesses whether agent handled errors gracefully |
Task Categories
| Category | Examples | Count |
|---|---|---|
| Coding | Debug a script, add tests, refactor a function | 8 |
| Web research | Summarise a topic, extract data from a URL | 6 |
| File ops | Organise files, parse CSV, transform data | 6 |
| Sysadmin | Find large files, check processes, write a cron | 5 |
| Data analysis | Analyse a dataset, plot a chart, compute stats | 5 |
Why This Matters
Hermes agent's loop is: real task → tool use → outcome. That's the exact data distribution Hermes 4 needs to become an agentic model. This pipeline closes the loop — the agent that serves users becomes the teacher that trains the next version.
Built for the Nous Research Hermes Agent Hackathon.