bryercowan/hermes-embodied
Self-improving robotics via Hermes Agent. One skill that provisions cloud GPUs, fine-tunes VLA models, runs sim evaluation, and autonomously improves robot policies. Built for the Nous Research Hermes Agent Hackathon.
Hermes Embodied is a robotics extension for Hermes Agent that enables autonomous fine-tuning of Vision-Language-Action (VLA) models through natural language. The system orchestrates a self-improvement loop by collecting robot trajectories in simulation or hardware, provisioning cloud GPUs via Vast.ai, and retraining models when performance thresholds are met. It integrates with the LeRobot framework and MuJoCo simulation to automate the transition from data collection to model promotion. This allows users to manage complex robotics training pipelines without manual machine learning expertise.
- Automates VLA fine-tuning loops using natural language commands
- Provisions and manages Vast.ai cloud GPU instances autonomously
- Supports SmolVLA and GR00T models for simulation or physical hardware
full readme from github
Hermes Embodied: Self-Improving Robotics via Hermes Agent
"Any robot owner can fine-tune a state-of-the-art VLA by talking to their agent. No ML expertise needed."
What Is This?
Hermes Embodied turns Hermes Agent into a self-improving robotics trainer. It adds three Hermes skills that close the loop between robot execution, training data collection, and model improvement — all orchestrated through natural language.
The same self-improvement loop that Hermes uses to get better at coding tasks (via Tinker-Atropos RL) now extends to physical robot control via Vision-Language-Action models.
Architecture
┌─────────────────────────────────────────────────────┐
│ HERMES AGENT │
│ (Reasoning Layer — plans, monitors, orchestrates) │
├─────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ vast-gpu │ │ vla-trainer │ │ robot-loop │ │
│ │ (skill) │ │ (skill) │ │ (skill) │ │
│ │ │ │ │ │ │ │
│ │ Provision │ │ SmolVLA / │ │ Deploy model │ │
│ │ & manage │ │ GR00T fine- │ │ Collect traj │ │
│ │ cloud GPU │ │ tuning on │ │ Auto-retrain │ │
│ │ instances │ │ LeRobot data │ │ when improved │ │
│ └──────────┘ └──────────────┘ └───────────────┘ │
│ │
├─────────────────────────────────────────────────────┤
│ SIMULATION / HARDWARE │
│ │
│ MuJoCo + LeRobot gym_hil OR SO-ARM101 + USB │
│ (Franka Panda sim tasks) (Physical arm) │
└─────────────────────────────────────────────────────┘
The Self-Improvement Loop
- Deploy — Hermes loads a VLA checkpoint and runs it in sim (or on hardware)
- Collect — Every rollout is recorded as a LeRobot trajectory (state, action, camera, reward)
- Curate — Hermes filters successful trajectories (reward > threshold)
- Train — Provisions a GPU on Vast.ai and fine-tunes SmolVLA on the new data
- Evaluate — Runs open-loop eval comparing new checkpoint vs. old
- Promote — If new model is better, it becomes the active policy
- Repeat — Scheduled via Hermes cron, runs autonomously
Skills
vast-gpu — Cloud GPU Infrastructure
Provision, monitor, and teardown GPU instances on Vast.ai through natural language.
- "Spin up an A100 for training" → finds cheapest A100, creates instance, returns SSH access
- "How's my training instance?" → checks status, GPU utilization, cost so far
- "Tear down the GPU" → destroys instance, confirms billing stopped
vla-trainer — VLA Fine-Tuning Pipeline
End-to-end fine-tuning of Vision-Language-Action models.
- Supports SmolVLA (450M, fast) and GR00T N1.5 (3B, powerful)
- Handles data prep, LeRobot format conversion, stats validation
- Runs training on Vast.ai with WandB monitoring
- Open-loop evaluation with trajectory visualization
robot-loop — Continuous Improvement
The autonomous improvement cycle.
- Runs VLA inference in MuJoCo simulation
- Collects and scores trajectories
- Triggers retraining when enough new data accumulates
- A/B tests new checkpoints against current best
- Promotes winners, logs everything
Quick Start
# Tell Hermes what you want
"Set up a simulation environment for pick-and-place tasks"
# Hermes installs MuJoCo, LeRobot, configures the Franka Panda env
"Train SmolVLA on the pick-and-place demo dataset"
# Hermes provisions a Vast.ai GPU, downloads data, runs fine-tuning
"Deploy the trained model and start the improvement loop"
# Hermes runs inference in sim, collects trajectories, schedules retraining
Hardware Support (Optional)
For physical deployment on SO-ARM101:
- Leader arm (teleoperation/demo recording)
- Follower arm (autonomous execution)
- USB cameras (wrist + global view)
- Any Linux machine with USB ports
Models Supported
| Model | Params | Train Time (A100) | VRAM | Best For |
|---|---|---|---|---|
| SmolVLA | 450M | ~4h / 20k steps | 22GB | Fast iteration, prototyping |
| GR00T N1.5 | 3B | ~4h / 10k steps | 25GB | Production, complex tasks |
| GR00T N1.6 | 3B | ~4h / 10k steps | 25GB | Latest, best performance |
Cost Estimate
- Vast.ai A100 80GB: ~$1/hr → ~$4 per training run
- Vast.ai A6000 48GB: ~$0.50/hr → ~$2 per training run
- Simulation: Free (local CPU/GPU)
- Physical arm (optional): ~$200-$440
Project Structure
hermes-embodied/
├── README.md
├── skills/
│ ├── vast-gpu/
│ │ └── SKILL.md
│ ├── vla-trainer/
│ │ └── SKILL.md
│ └── robot-loop/
│ └── SKILL.md
├── scripts/
│ ├── setup_sim.py # MuJoCo + LeRobot environment setup
│ ├── collect_trajectories.py # Run VLA in sim, save rollouts
│ ├── train_smolvla.py # Fine-tuning wrapper
│ ├── evaluate.py # Open-loop eval + metrics
│ └── improvement_loop.py # Full autonomous loop
├── configs/
│ ├── sim_env.json # Simulation environment config
│ ├── training.yaml # Training hyperparameters
│ └── vast_instance.yaml # GPU instance specs
└── docs/
└── ARCHITECTURE.md
Built With
- Hermes Agent — AI agent framework with skills, memory, and RL training
- LeRobot — Open-source robotics framework by Hugging Face
- SmolVLA — 450M parameter Vision-Language-Action model
- Vast.ai — Affordable cloud GPU rental
- MuJoCo — Physics simulation for robotics
- WandB — Experiment tracking
License
MIT