greyhaven-ai/autocontext

Name: autocontext
Author: greyhaven-ai

Recursive self-improving context harness — helps agents succeed on complex tasks

★ 1.2K

overview

autocontext is a recursive self-improvement harness that enables AI agents to refine their performance on complex tasks through iterative evaluation. The system executes tasks against specific scenarios, retains successful strategies in playbooks, and discards ineffective approaches to build a persistent knowledge base. It generates structured artifacts including traces, reports, and datasets that can be used to distill agent behavior into local-model training data. The toolkit supports multiple integration paths including MCP servers, a CLI-first skill for Hermes, and a dedicated Pi extension.

Generates persistent playbooks and hints from iterative task runs
Exports CLI-first skills specifically formatted for Hermes agent integration
Distills successful agent behaviors into local-model training artifacts

full readme from github

autocontext logo

a recursive self-improving harness designed to help your agents (and future iterations of those agents) succeed on any task

autocontext is a harness for agent improvement. Give it a goal, it runs the task against evaluation, keeps the useful lessons, discards dead ends, and leaves traces, reports, playbooks, datasets, and optional local-model training artifacts for the next run.

Docs: autocontext.ai/docs · quickstart · CLI reference · changelog

Install

Surface	Command
Python CLI	`uv tool install autocontext==0.11.0`
Python library/dev	`uv pip install autocontext==0.11.0`
TypeScript/Node CLI	`bun add -g autoctx@0.11.0`
Pi extension	`pi install npm:pi-autocontext@0.9.0`

The PyPI package is autocontext; the CLI is autoctx. The npm package is autoctx (not the unrelated autocontext npm package). Provider variables live in .env.example.

30-Second Run

Pi is the lowest-friction provider because it uses your local agent auth:

AUTOCONTEXT_AGENT_PROVIDER=pi \
AUTOCONTEXT_PI_COMMAND=pi \
autoctx solve "improve customer-support replies for billing disputes" --iterations 3

Use AUTOCONTEXT_AGENT_PROVIDER=anthropic, openai-compatible, claude-cli, codex, pi-rpc, or another provider when you need that runtime. See agent integration for the full matrix.

Agent Entry Points

Pi: install pi-autocontext, then ask Pi to solve, judge, improve, list, or inspect runs through the packaged skill.
MCP clients: run autoctx mcp-serve or bunx autoctx mcp-serve and expose the tools to Claude Code, Cursor, or another MCP client.
Hermes: export the CLI-first skill with uv run autoctx hermes export-skill --with-references --json.

Full setup: autocontext/docs/agent-integration.md.

What A Run Leaves Behind

runs/<run_id>/
├── trace.jsonl
├── generations/<n>/{strategy.json,analysis.md,score.json}
├── report.md
└── artifacts/

knowledge/<scenario>/
├── playbook.md
├── hints.md
└── tools/

Everything is filesystem-first: inspect it, diff it, replay it, export it, or feed it into training.

Core Surfaces

Surface	Command	Use it for
`solve`	`autoctx solve "..." --iterations 3`	Start from a plain-language goal
`run`	`autoctx run <scenario> --iterations 3`	Improve a saved scenario
`simulate`	`autoctx simulate -d "..."`	Model/replay/compare system behavior
`investigate`	`autoctx investigate -d "..."`	Evidence-driven diagnosis
`mission`	`autoctx mission create --name "..." --goal "..."`	Verifier-driven multi-step goals
`train`	`uv run autoctx train --scenario <name> --data <jsonl>`	Distill stable behavior into a cheaper runtime (Python)
`mcp-serve`	`autoctx mcp-serve`	Give an agent the autocontext tool surface

Python owns the full control-plane package; TypeScript owns several operator-facing surfaces, the TUI, and Node runtime adapters. Start with autocontext/README.md or ts/README.md.

What's New in 0.11.0

Guardrail parity fixes the CLI agent-task path so it applies the same guardrail-adjusted score threshold as the task runner, evaluated against the fully prepared task state.
Branded id types add RunId, ScenarioName, and DbPath to the TypeScript package root; GenerationRunner.run now takes a RunId (construct with asRunId), a compile-time-only change.
Dependency refresh clears every critical and high security advisory across the Python and TypeScript packages, including the anthropic sdk, urllib3, starlette, and vitest lines.

Scenario Families

The shipped families cover games, agent tasks, simulations, artifact editing, investigations, workflows, negotiation, schema evolution, tool fragility, operator loops, and coordination. Python and TypeScript share the family vocabulary; see docs/scenario-parity-matrix.md for parity details.

Package Guides

Need	Go here
Python CLI/library, MCP, HTTP, training	autocontext/README.md
Node CLI, TUI, missions, Fetch/agent adapters	ts/README.md
Pi package	pi/README.md
Copy-paste examples	examples/README.md
Concepts and docs index	docs/README.md
Contributor setup	CONTRIBUTING.md
Repo guide for agents	AGENTS.md

Project Signals

Acknowledgments

Thanks to George for generously donating the autocontext name on PyPI.