Welcome to AlphaApollo

AlphaApollo is a flexible, efficient, and production-ready RL training framework for LLM post-training. It follows the HybridFlow architecture and adds project-specific extensions.

Why AlphaApollo?

AlphaApollo is designed to make RL training for LLMs accessible, flexible, and scalable:

🚀 Easy to Use

Simple API: Build complex RL dataflows with just a few lines of code
Diverse RL Algorithms: Support for PPO, GRPO, and more (via verl integration)
Ready-to-Use Examples: Out-of-the-box scripts for various environments

🔧 Flexible & Modular

Seamless Integration: Works with PyTorch FSDP, Megatron-LM, vLLM
Customizable Components: Easy to extend with custom environments, rewards, and memory systems
Flexible Device Mapping: Efficient resource utilization across different cluster sizes

Quick Start

Installation

conda create -n alphaapollo python==3.12 -y
conda activate alphaapollo

git clone https://github.com/tmlr-group/AlphaApollo.git
cd AlphaApollo

bash installation.sh

Run Your First Training

Check out our examples for different workflows:

# RL Training (with tool use)
cd examples/rl
bash run_rl_informal_math_tool.sh

# Self-Evolution
cd examples/evo
bash run_evo_informal_math.sh

# SFT (with tool use)
cd examples/sft
bash run_sft_informal_math_tool.sh

Architecture

AlphaApollo uses a hybrid architecture that enables:

Flexible Dataflow: Define complex RL training pipelines
Efficient Execution: Optimize computation across multiple GPUs
Modular Design: Easy to customize and extend components

What's Next?

Installation Guide - Detailed setup instructions
Quick Start Tutorial - Your first AlphaApollo training job
Core Modules - Agent system, evolution, data pipeline, and tools
Algorithms - Explore supported RL algorithms
Configuration Reference - Detailed API and runtime configuration documentation
Contribution Guide - Add your own tools, environments, and algorithms

Documentation Structure

This documentation is organized into the following sections:

Getting Started

Installation - Environment setup, dependencies, and troubleshooting
Quick Start - Run core workflows and example scripts
Troubleshooting & FAQ - Common issues and solutions

Core Modules

Core Modules Overview - High-level architecture map
Agent System - Multi-turn environment and manager flow
Self-Evolution - Policy-verifier iterative refinement
Dataset Pipeline - Data preprocessing and schema normalization
Tools - Tool registration, execution, and built-ins

Configuration

Configuration Overview - Hydra basics and CLI overrides
RL Training Config - PPO/GRPO parameter details
Generation Config - Offline generation settings
Evolving Config - Self-evolution runtime settings

Algorithms

Algorithms Overview - End-to-end workflow map
RL Training - verl integration and RL training flow
SFT - Supervised fine-tuning pipeline
Evolving Pipeline - Inference-time self-improvement

Contribution

Contribution Guide - Extension points and conventions
Adding a New Tool - Implement and register custom tools
Adding a New Environment - Add a new task domain
Adding a New Algorithm - Add a new workflow

Community & Support

GitHub: AlphaApollo
Paper: AlphaApollo on arXiv

Contributing

We welcome contributions! AlphaApollo is open-source under the Apache 2.0 license. Check out our contributing guide to get started.

Ready to get started? Head over to the Installation Guide to begin your journey with AlphaApollo!

Why AlphaApollo?​

🚀 Easy to Use​

🔧 Flexible & Modular​

Quick Start​

Installation​

Run Your First Training​

Architecture​

What's Next?​

Documentation Structure​

Getting Started​

Core Modules​

Configuration​

Algorithms​

Contribution​

Recommended Reading Path​

Community & Support​

Contributing​