hermes-embodied

Self-improving robotics via Hermes Agent. One skill that provisions cloud GPUs, fine-tunes VLA models, runs sim evaluation, and autonomously improves robot policies. Built for the Nous Research Hermes Agent Hackathon.

β˜… 4 Python Updated 3/5/2026
View on GitHub β†’

Hermes Embodied: Self-Improving Robotics via Hermes Agent

"Any robot owner can fine-tune a state-of-the-art VLA by talking to their agent. No ML expertise needed."

What Is This?

Hermes Embodied turns Hermes Agent into a self-improving robotics trainer. It adds three Hermes skills that close the loop between robot execution, training data collection, and model improvement β€” all orchestrated through natural language.

The same self-improvement loop that Hermes uses to get better at coding tasks (via Tinker-Atropos RL) now extends to physical robot control via Vision-Language-Action models.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   HERMES AGENT                       β”‚
β”‚  (Reasoning Layer β€” plans, monitors, orchestrates)   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ vast-gpu  β”‚  β”‚  vla-trainer β”‚  β”‚  robot-loop   β”‚  β”‚
β”‚  β”‚  (skill)  β”‚  β”‚   (skill)    β”‚  β”‚   (skill)     β”‚  β”‚
β”‚  β”‚           β”‚  β”‚              β”‚  β”‚               β”‚  β”‚
β”‚  β”‚ Provision β”‚  β”‚ SmolVLA /    β”‚  β”‚ Deploy model  β”‚  β”‚
β”‚  β”‚ & manage  β”‚  β”‚ GR00T fine-  β”‚  β”‚ Collect traj  β”‚  β”‚
β”‚  β”‚ cloud GPU β”‚  β”‚ tuning on    β”‚  β”‚ Auto-retrain  β”‚  β”‚
β”‚  β”‚ instances β”‚  β”‚ LeRobot data β”‚  β”‚ when improved β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚              SIMULATION / HARDWARE                   β”‚
β”‚                                                      β”‚
β”‚  MuJoCo + LeRobot gym_hil    OR    SO-ARM101 + USB  β”‚
β”‚  (Franka Panda sim tasks)          (Physical arm)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Self-Improvement Loop

  1. Deploy β€” Hermes loads a VLA checkpoint and runs it in sim (or on hardware)
  2. Collect β€” Every rollout is recorded as a LeRobot trajectory (state, action, camera, reward)
  3. Curate β€” Hermes filters successful trajectories (reward > threshold)
  4. Train β€” Provisions a GPU on Vast.ai and fine-tunes SmolVLA on the new data
  5. Evaluate β€” Runs open-loop eval comparing new checkpoint vs. old
  6. Promote β€” If new model is better, it becomes the active policy
  7. Repeat β€” Scheduled via Hermes cron, runs autonomously

Skills

vast-gpu β€” Cloud GPU Infrastructure

Provision, monitor, and teardown GPU instances on Vast.ai through natural language.

vla-trainer β€” VLA Fine-Tuning Pipeline

End-to-end fine-tuning of Vision-Language-Action models.

robot-loop β€” Continuous Improvement

The autonomous improvement cycle.

Quick Start

# Tell Hermes what you want
"Set up a simulation environment for pick-and-place tasks"

# Hermes installs MuJoCo, LeRobot, configures the Franka Panda env

"Train SmolVLA on the pick-and-place demo dataset"

# Hermes provisions a Vast.ai GPU, downloads data, runs fine-tuning

"Deploy the trained model and start the improvement loop"

# Hermes runs inference in sim, collects trajectories, schedules retraining

Hardware Support (Optional)

For physical deployment on SO-ARM101:

Models Supported

Model Params Train Time (A100) VRAM Best For
SmolVLA 450M ~4h / 20k steps 22GB Fast iteration, prototyping
GR00T N1.5 3B ~4h / 10k steps 25GB Production, complex tasks
GR00T N1.6 3B ~4h / 10k steps 25GB Latest, best performance

Cost Estimate

Project Structure

hermes-embodied/
β”œβ”€β”€ README.md
β”œβ”€β”€ skills/
β”‚   β”œβ”€β”€ vast-gpu/
β”‚   β”‚   └── SKILL.md
β”‚   β”œβ”€β”€ vla-trainer/
β”‚   β”‚   └── SKILL.md
β”‚   └── robot-loop/
β”‚       └── SKILL.md
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ setup_sim.py          # MuJoCo + LeRobot environment setup
β”‚   β”œβ”€β”€ collect_trajectories.py # Run VLA in sim, save rollouts
β”‚   β”œβ”€β”€ train_smolvla.py      # Fine-tuning wrapper
β”‚   β”œβ”€β”€ evaluate.py           # Open-loop eval + metrics
β”‚   └── improvement_loop.py   # Full autonomous loop
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ sim_env.json          # Simulation environment config
β”‚   β”œβ”€β”€ training.yaml         # Training hyperparameters
β”‚   └── vast_instance.yaml    # GPU instance specs
└── docs/
    └── ARCHITECTURE.md

Built With

License

MIT