Welcome to AlphaApollo
AlphaApollo is a flexible, efficient, and production-ready RL training framework for LLM post-training. It follows the HybridFlow architecture and adds project-specific extensions.
Why AlphaApollo?
AlphaApollo is designed to make RL training for LLMs accessible, flexible, and scalable:
🚀 Easy to Use
- Simple API: Build complex RL dataflows with just a few lines of code
- Diverse RL Algorithms: Support for PPO, GRPO, and more (via verl integration)
- Ready-to-Use Examples: Out-of-the-box scripts for various environments
🔧 Flexible & Modular
- Seamless Integration: Works with PyTorch FSDP, Megatron-LM, vLLM
- Customizable Components: Easy to extend with custom environments, rewards, and memory systems
- Flexible Device Mapping: Efficient resource utilization across different cluster sizes
Quick Start
Installation
conda create -n alphaapollo python==3.12 -y
conda activate alphaapollo
git clone https://github.com/tmlr-group/AlphaApollo.git
cd AlphaApollo
bash installation.sh
Run Your First Training
Check out our examples for different workflows:
# RL Training (with tool use)
cd examples/rl
bash run_rl_informal_math_tool.sh
# Self-Evolution
cd examples/evo
bash run_evo_informal_math.sh
# SFT (with tool use)
cd examples/sft
bash run_sft_informal_math_tool.sh
Architecture
AlphaApollo uses a hybrid architecture that enables:
- Flexible Dataflow: Define complex RL training pipelines
- Efficient Execution: Optimize computation across multiple GPUs
- Modular Design: Easy to customize and extend components
What's Next?
- Installation Guide - Detailed setup instructions
- Quick Start Tutorial - Your first AlphaApollo training job
- Core Modules - Agent system, evolution, data pipeline, and tools
- Algorithms - Explore supported RL algorithms
- Configuration Reference - Detailed API and runtime configuration documentation
- Contribution Guide - Add your own tools, environments, and algorithms
Documentation Structure
This documentation is organized into the following sections:
Getting Started
- Installation - Environment setup, dependencies, and troubleshooting
- Quick Start - Run core workflows and example scripts
- Troubleshooting & FAQ - Common issues and solutions
Core Modules
- Core Modules Overview - High-level architecture map
- Agent System - Multi-turn environment and manager flow
- Self-Evolution - Policy-verifier iterative refinement
- Dataset Pipeline - Data preprocessing and schema normalization
- Tools - Tool registration, execution, and built-ins
Configuration
- Configuration Overview - Hydra basics and CLI overrides
- RL Training Config - PPO/GRPO parameter details
- Generation Config - Offline generation settings
- Evolving Config - Self-evolution runtime settings
Algorithms
- Algorithms Overview - End-to-end workflow map
- RL Training - verl integration and RL training flow
- SFT - Supervised fine-tuning pipeline
- Evolving Pipeline - Inference-time self-improvement
Contribution
- Contribution Guide - Extension points and conventions
- Adding a New Tool - Implement and register custom tools
- Adding a New Environment - Add a new task domain
- Adding a New Algorithm - Add a new workflow
Recommended Reading Path
If you're new to AlphaApollo, we recommend reading in this order:
- Start with Installation and Quick Start to run a working baseline
- Read Core Modules Overview and Agent System to understand runtime flow
- Choose your path: RL Training / SFT / Evolving Pipeline
- Use Configuration pages to tune behavior and scale
- Follow Contribution Guide to extend the framework safely
Community & Support
- GitHub: AlphaApollo
- Paper: AlphaApollo on arXiv
Contributing
We welcome contributions! AlphaApollo is open-source under the Apache 2.0 license. Check out our contributing guide to get started.
Ready to get started? Head over to the Installation Guide to begin your journey with AlphaApollo!