hermes-embodied
Self-improving robotics via Hermes Agent. One skill that provisions cloud GPUs, fine-tunes VLA models, runs sim evaluation, and autonomously improves robot policies. Built for the Nous Research Hermes Agent Hackathon.
Hermes Embodied: Self-Improving Robotics via Hermes Agent
"Any robot owner can fine-tune a state-of-the-art VLA by talking to their agent. No ML expertise needed."
What Is This?
Hermes Embodied turns Hermes Agent into a self-improving robotics trainer. It adds three Hermes skills that close the loop between robot execution, training data collection, and model improvement β all orchestrated through natural language.
The same self-improvement loop that Hermes uses to get better at coding tasks (via Tinker-Atropos RL) now extends to physical robot control via Vision-Language-Action models.
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HERMES AGENT β
β (Reasoning Layer β plans, monitors, orchestrates) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β β vast-gpu β β vla-trainer β β robot-loop β β
β β (skill) β β (skill) β β (skill) β β
β β β β β β β β
β β Provision β β SmolVLA / β β Deploy model β β
β β & manage β β GR00T fine- β β Collect traj β β
β β cloud GPU β β tuning on β β Auto-retrain β β
β β instances β β LeRobot data β β when improved β β
β ββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β SIMULATION / HARDWARE β
β β
β MuJoCo + LeRobot gym_hil OR SO-ARM101 + USB β
β (Franka Panda sim tasks) (Physical arm) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The Self-Improvement Loop
- Deploy β Hermes loads a VLA checkpoint and runs it in sim (or on hardware)
- Collect β Every rollout is recorded as a LeRobot trajectory (state, action, camera, reward)
- Curate β Hermes filters successful trajectories (reward > threshold)
- Train β Provisions a GPU on Vast.ai and fine-tunes SmolVLA on the new data
- Evaluate β Runs open-loop eval comparing new checkpoint vs. old
- Promote β If new model is better, it becomes the active policy
- Repeat β Scheduled via Hermes cron, runs autonomously
Skills
vast-gpu β Cloud GPU Infrastructure
Provision, monitor, and teardown GPU instances on Vast.ai through natural language.
- "Spin up an A100 for training" β finds cheapest A100, creates instance, returns SSH access
- "How's my training instance?" β checks status, GPU utilization, cost so far
- "Tear down the GPU" β destroys instance, confirms billing stopped
vla-trainer β VLA Fine-Tuning Pipeline
End-to-end fine-tuning of Vision-Language-Action models.
- Supports SmolVLA (450M, fast) and GR00T N1.5 (3B, powerful)
- Handles data prep, LeRobot format conversion, stats validation
- Runs training on Vast.ai with WandB monitoring
- Open-loop evaluation with trajectory visualization
robot-loop β Continuous Improvement
The autonomous improvement cycle.
- Runs VLA inference in MuJoCo simulation
- Collects and scores trajectories
- Triggers retraining when enough new data accumulates
- A/B tests new checkpoints against current best
- Promotes winners, logs everything
Quick Start
# Tell Hermes what you want
"Set up a simulation environment for pick-and-place tasks"
# Hermes installs MuJoCo, LeRobot, configures the Franka Panda env
"Train SmolVLA on the pick-and-place demo dataset"
# Hermes provisions a Vast.ai GPU, downloads data, runs fine-tuning
"Deploy the trained model and start the improvement loop"
# Hermes runs inference in sim, collects trajectories, schedules retraining
Hardware Support (Optional)
For physical deployment on SO-ARM101:
- Leader arm (teleoperation/demo recording)
- Follower arm (autonomous execution)
- USB cameras (wrist + global view)
- Any Linux machine with USB ports
Models Supported
| Model | Params | Train Time (A100) | VRAM | Best For |
|---|---|---|---|---|
| SmolVLA | 450M | ~4h / 20k steps | 22GB | Fast iteration, prototyping |
| GR00T N1.5 | 3B | ~4h / 10k steps | 25GB | Production, complex tasks |
| GR00T N1.6 | 3B | ~4h / 10k steps | 25GB | Latest, best performance |
Cost Estimate
- Vast.ai A100 80GB: ~$1/hr β ~$4 per training run
- Vast.ai A6000 48GB: ~$0.50/hr β ~$2 per training run
- Simulation: Free (local CPU/GPU)
- Physical arm (optional): ~$200-$440
Project Structure
hermes-embodied/
βββ README.md
βββ skills/
β βββ vast-gpu/
β β βββ SKILL.md
β βββ vla-trainer/
β β βββ SKILL.md
β βββ robot-loop/
β βββ SKILL.md
βββ scripts/
β βββ setup_sim.py # MuJoCo + LeRobot environment setup
β βββ collect_trajectories.py # Run VLA in sim, save rollouts
β βββ train_smolvla.py # Fine-tuning wrapper
β βββ evaluate.py # Open-loop eval + metrics
β βββ improvement_loop.py # Full autonomous loop
βββ configs/
β βββ sim_env.json # Simulation environment config
β βββ training.yaml # Training hyperparameters
β βββ vast_instance.yaml # GPU instance specs
βββ docs/
βββ ARCHITECTURE.md
Built With
- Hermes Agent β AI agent framework with skills, memory, and RL training
- LeRobot β Open-source robotics framework by Hugging Face
- SmolVLA β 450M parameter Vision-Language-Action model
- Vast.ai β Affordable cloud GPU rental
- MuJoCo β Physics simulation for robotics
- WandB β Experiment tracking
License
MIT