atropos OFFICIAL

Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse environments

โ˜… 1.0K Python MIT Nous Research Updated 4/13/2026
View on GitHub โ†’

Atropos - Nous Research's LLM RL Gym

newatr-02

In Greek mythology, Atropos was the eldest of the three Fates. While her sisters spun and measured the threads of mortal lives, Atropos alone held the shears that would cut these threads, determining the final destiny of each soul. Just as Atropos guided souls to their ultimate fate, this system guides language models toward their optimal potential through reinforcement learning.


What is Atropos?

Atropos is an environment microservice framework for async RL with LLMs.

Atropos encompasses both environments, which are set up as services, and a trajectory API for the environments to send data to and for the trainer to pull batches from.

image

Here is a diagram of how Atropos' components can interact with a trainer & inference server to complete the RL loop (trainer & inference engine not included with the atropos package)

Atropos is a robust, scalable framework for Reinforcement Learning Environments with LLMs.

The goal: provide a flexible, scalable, and standardized platform to accelerate LLM-based RL research across diverse, interactive settings.

The framework supports collecting, distributing and evaluating LLM trajectories through diverse environments including:

Environment Type Examples Purpose
๐Ÿ“š Dataset environments GSM8K, MMLU, Custom HF Datasets Evaluate and improve LLM performance on static data
๐ŸŽฎ Online environments Blackjack, Taxi, Text-based games Train LLMs through interactive game-based learning
๐Ÿค– RLAIF and RLHF LLM Judge/Reward Models Fine-tune LLMs using human feedback and alignment
๐Ÿ”„ Multi-Turn RL deepresearch, internal tool calling Train LLMs on complex multi-step interactions
๐Ÿ’ป Code Execution MBPP, HumanEval (via coding_server.py) Train LLMs to generate and execute code
๐Ÿ–ผ๏ธ Multimodal OCR VQA, Clevr (via multimodal_dpo/) Train LLMs on tasks involving vision and language

Experimental results from models trained using Atropos' environments

We have been able to achieve significant improvements on specific domains or tasks with Atropos - Below are some of the results.

Tool Calling Environment Results:

Berkeley Function Calling Benchmark Type Base Model With Atropos RL Improvement
Parallel Tasks 10% 46% 4.6x โฌ†๏ธ
Simple Tasks 21% 51.75% 2.5x โฌ†๏ธ

Model Artifact: https://huggingface.co/NousResearch/DeepHermes-ToolCalling-Specialist-Atropos

Environment Used: https://github.com/NousResearch/atropos/blob/main/environments/tool_calling_server.py


Financial Fundamentals Prediction Environment Results:

Metric Initial Accuracy With Atropos RL Improvement
Directional Prediction Eval Accuracy 20% 50% 2.5x ๐Ÿ“ˆ

Model Artifact: https://huggingface.co/NousResearch/DeepHermes-Financial-Fundamentals-Prediction-Specialist-Atropos

Environment Used: https://github.com/NousResearch/atropos/blob/main/environments/fundamental_prediction_environment.py


RLAIF Experiment Artifacts

Using the RLAIF Environment to change the personality of the model, we have produced several artifacts of interesting and weird personalities.

DeepHermes Egregore v1 and v2 8B:

https://huggingface.co/NousResearch/DeepHermes-Egregore-v1-RLAIF-8b-Atropos https://huggingface.co/NousResearch/DeepHermes-Egregore-v2-RLAIF-8b-Atropos

DeepHermes Ascension Maze 8B:

https://huggingface.co/NousResearch/DeepHermes-AscensionMaze-RLAIF-8b-Atropos

Environment Used: https://github.com/NousResearch/atropos/blob/main/environments/rlaif_server.py


Navigating the Repo

Category Description
๐Ÿ“ atroposlib/ Core library containing base classes and utilities
๐ŸŽฎ environments/ Collection of ready-to-use RL environments. Community contributions are typically placed in the environments/community/ subdirectory.
๐Ÿ“š example_trainer/ Example training scripts and configurations

Key Documents:


Prerequisites

Before installing Atropos, ensure you have the following:

Note: You do not need a GPU to develop or test environments locally. A GPU is only required for running inference servers locally or for training.


Installation

Get your Python 3.10 (or later) environment ready, then simply pip install:

pip install atroposlib

If you're looking to get into developing the repo or using the environments:

pip install -e .                # for using
pip install -e .[dev]           # for development
pip install -e .[examples]      # for running examples
pip install -e .[verifiers]     # for verifiers integration
pip install -e .[all]           # for everything

Important: If you're committing to the repository, please install the pre-commit hooks:

pre-commit install

Quick Start Guide

  1. Create Your First Environment

  2. Run an Example Environment

You should edit the config_init section of the environment file you want (For example, in GSM8K Environment) to point to a running VLLM or SGLang inference server as well as any other configuration changes you'd like to make, such as the group size, then:

Note: By default, Atropos uses the OpenAI-compatible API endpoint which works with any provider. For enhanced features, use VLLMServer (atroposlib/envs/server_handling/vllm_server.py) or SGLangServer (atroposlib/envs/server_handling/sglang_server.py) for direct access to native APIs with full token and logprob tracking.

# Start the API server
run-api

In a separate terminal, start the GSM8K environment microservice

python environments/gsm8k_server.py serve --openai.model_name Qwen/Qwen2.5-1.5B-Instruct --slurm false
# alternatively
# python environments/gsm8k_server.py serve --config environments/configs/example.yaml
# python environments/gsm8k_server.py serve --config environments/configs/example.yaml --env.group_size 8 # cli args override corresponding config settings
  1. Grabbing Rollouts

If you want to just start getting rollouts, and not use a trainer, see the debug section for help getting started with the available tools, we recommend starting with process or view-run

  1. Training Your Model
    • Follow our training example guide for detailed instructions
    • Monitor progress through our built-in logging and reporting system:
      • Completion lengths
      • Evaluation accuracies
      • Full rollouts and scores

You can use multiple environments at once, just point them all to the same server.

Environments come with detailed logging and reporting support, runs track completion lengths, eval accuracies, full rollouts and scores, and more:

image


Trainer Integrations

Axolotl

Atropos plugin logo

Axolotl is a powerful tool for fine-tuning a wide range of AI models, supporting techniques like LoRA and QLoRA through simple YAML configurations.

The Atropos plugin for Axolotl seamlessly integrates Atropos' RL environments into Axolotl's training pipelines. This allows you to leverage Atropos for reinforcement learning while utilizing Axolotl's extensive features for model fine-tuning.

To use, follow the README on the plugin repository.

Tinker

Atropos Tinker logo

The Tinker API is a simple and flexible LoRA trainer framework for researchers and developers to quickly build out their ideas without worrying about the complexities of distributed training. Users write a simple loop that runs on their CPU, and Tinker manages the backend computation on their GPUs, while still providing full control over the training and algorithmic details.

The Tinker-Atropos integration layer enables all Atropos environments to leverage the power of Tinker for their RL experiments. This allows users with little or no compute to develop and build Atropos environments with minimal worry about the underlying compute behavior, as well as providing an easy environment integration point for Tinker users.

To get started, check out the README at the project repository.

Atropos' Example Trainer

Atropos repo contains an example trainer that should primarily be used as a reference example to show how a trainer and inference provider can be integrated with Atropos to complete the RL Training Loop.

To use the example trainer, see this page: training example guide

On-Policy Distillation (API + ScoredDataGroup Contract)

Atropos now supports OPD at the transport layer by carrying distillation arrays through ScoredDataGroup and the API queue/batch endpoints.

Scope of this change

Distillation payload fields

Each scored group may include:

These fields are optional, and when present are forwarded from:

Minimal producer example (environment side)

scores["distill_token_ids"] = distill_token_ids
scores["distill_logprobs"] = distill_logprobs

Minimal consumer check (trainer/debug side)

curl -s http://localhost:8002/latest_example | jq '{has_ids:(.distill_token_ids!=null), has_lps:(.distill_logprobs!=null)}'

Notes

TeacherDistillationEnv follow-up

The follow-up teacher environment uses a dedicated teacher server config and attaches teacher prompt logprobs before the group is sent to the API.

Teacher config shape:

TeacherDistillationConfig(
    teacher_enabled=True,
    teacher_top_k=8,
)

Teacher server configs are passed separately at init, just like the primary server_configs:

env = MyTeacherEnv(
    config=env_config,
    server_configs=student_server_configs,
    teacher_server_configs=[
        APIServerConfig(
            base_url="http://localhost:9003/v1",
            model_name="Qwen/Qwen3-30B-A3B-Instruct-2507",
            api_key="",
            server_type="vllm",
            tokenizer_name="Qwen/Qwen3-30B-A3B-Instruct-2507",
        )
    ],
)

You can either:

In both cases, TeacherDistillationEnv still assumes the normal BaseEnv runtime contract: tokenized rollouts, ScoredDataGroup payloads, and the standard handle_send_to_api(...) transport path.

CLI shape:

--env.teacher_enabled true \
--teacher.base_url "http://localhost:9003/v1" \
--teacher.model_name "Qwen/Qwen3-30B-A3B-Instruct-2507" \
--teacher.server_type vllm \
--env.teacher_top_k 8

If --teacher.model_name is a deployment alias rather than a tokenizer identifier, also set --teacher.tokenizer_name ... so the env can validate tokenizer compatibility.

Scope note:

Tokenizer requirement:

Why same-tokenizer is required:


Testing and Debugging Tools

The trajectory-handler provides several debugging tools to help environment developers test and understand their environments locally without requiring the full distributed infrastructure.

After launching the API and your selected environments (e.g. run-api & python environments/gsm8k_server.py serve), you are then able to view them to get a quick look, or try to prepare some datasets for some offline training:

In-depth Local Environment Analysis with process

For developers looking to inspect and debug a single environment without the overhead of the run-api server or a full training loop, Atropos environments offer a process subcommand. This mode performs inference-only rollouts, meaning it runs your model within the environment to generate interactions, but does not perform any model training or updates.

The process subcommand executes the environment's full data pipeline:

  1. Generation: Produces model responses based on inputs from the environment.
  2. Parsing: Processes these raw model outputs into a structured format.
  3. Scoring: Applies the environment's reward logic to evaluate the quality of the generated responses.

Outputs and Visualization:

When you specify a path to save the generated data using the --env.data_path_to_save_groups your_output_file.jsonl argument (or a similar argument defined by the specific environment, check with --help), the process command provides several benefits:

Example Usage:

To run the process subcommand for an environment like gsm8k_server.py and save the outputs:

python environments/gsm8k_server.py process --env.data_path_to_save_groups gsm8k_rollouts.jsonl

This will create gsm8k_rollouts.jsonl and gsm8k_rollouts.html.

Customization:

You can customize the inference endpoint and other parameters for the process subcommand. For example, to use a different model or API endpoint:

python environments/gsm8k_server.py process \
  --env.data_path_to_save_groups gsm8k_rollouts.jsonl \
  --env.my_custom_field "value" \
  --openai.base_url https://your-custom-api-url/v1 \
  --openai.api_key YOUR_API_KEY \
  --openai.model_name your_model_identifier

You can add custom fields to the env namespace by returning a custom subclass of BaseEnvConfig in config_init [example].

Always refer to the specific environment script's help for all available options:

python environments/your_environment_script.py process --help

Environment Evaluation with evaluate

For running evaluation on environments, Atropos provides an evaluate subcommand that calls the environment's evaluate method:

python gsm8k_server.py evaluate \
  --openai.base_url https://openrouter.ai/api/v1 \
  --openai.api_key $OPENROUTER_API_KEY \
  --openai.model_name qwen/qwen3-14b

Offline Data Generation Quick Start

Run the following commands in separate terminals, in this order:

Terminal 1 โ€” Start the API server first (must be running before environments connect):

run-api

Terminal 2 โ€” Start an environment:

python gsm8k_server.py serve --slurm False # or an env of your choice

Terminal 3 โ€” Generate data:

atropos-sft-gen path/to/output.jsonl --tokenizer Qwen/Qwen2.5-1.5B-Instruct # or whichever tokenizer you have in your env config

Rejection sampling can be controlled via --save-top-n-per-group, --allow-negative-scores, and --minimum-score-diff-max-min. See atropos-sft-gen -h for more detailed usage info.

If you would like to use OpenAI models, please edit your config_init to something like the following:

    @classmethod
    def config_init(cls) -> Tuple[BaseEnvConfig, List[APIServerConfig]]:
        env_config = BaseEnvConfig(
            tokenizer_name="Qwen/Qwen2.5-1.5B-Instruct",
            group_size=8,
            use_wandb=True,
            rollout_server_url="http://localhost:8000",
            total_steps=1000,
            batch_size=12,
            steps_per_eval=100,
            max_token_length=2048,
            wandb_name="gsm8k",
        )
        server_configs = [
            APIServerConfig(
                model_name="gpt-4.1-nano",
                base_url=None,
                api_key=os.environ.get("OPENAI_API_KEY"),
                num_requests_for_eval=256,
            ),
        ]

        return env_config, server_configs

For DPO, replace atropos-sft-gen with atropos-dpo-gen and check atropos-dpo-gen -h for data filtering and saving options.


Troubleshooting

Address already in use when running run-api

Port 8000 is already occupied. Either stop the existing process or specify a different port:

# Find and stop the process using port 8000
lsof -ti:8000 | xargs kill -9

# Or use a different port
run-api --port 8001

ModuleNotFoundError or dependency conflicts

Ensure you're using a clean virtual environment with the correct Python version:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e ".[dev]"

OPENAI_API_KEY not set errors

Set your API key as an environment variable, or configure it in the environment's config_init:

export OPENAI_API_KEY="your-key-here"

Out of memory (OOM) when running environments locally

Use a smaller model for local development and testing. For example, configure model_name to a lightweight model like gpt-4.1-nano with an OpenAI API key, or use a quantized local model with vLLM.

Environment not connecting to the API server

Ensure run-api is running before starting any environments. By default, environments connect to http://localhost:8000. If your API is on a different host or port, update rollout_server_url in your environment's config.


Citation

If you have found the library helpful in your work, you can cite this repository as:

@misc{atropos,
  title        = {Atropos: An Async First Environment Rollout Controller},
  author       = {Mahan, Dakota and Jin, Roger and Teknium and Sands, Shannon and Yatsenko, Artem and Suphavadeeprasit, Jai and Malhotra, Karan and Guang, Chen and Li, Joe},
  howpublished = {\url{https://www.github.com/NousResearch/atropos}},
  year         = {2025},
  month        = {apr},
  note         = {Version 0.3.0},
}

Contributing

Atropos is built by the open-source AI community, and relies on our amazing contributors! Please see our contributing guide for more details on our code formatting, testing, etc. Please follow the Code of Conduct.


License

Atropos uses the MIT license, see the LICENSE file here for more information