AlphaApollo is an agentic reasoning framework that integrates multiple models and tools to enable iterative, verifiable, and self-evolving reasoning. It supports a wide range of agentic reasoning paradigms, including tool-integrated reasoning, agentic post-training (multi-turn SFT and reinforcement learning), and agentic self-evolution. AlphaApollo incorporates multiple post-training algorithms such as PPO, GRPO, and DAPO, and provides dataset-backed agentic evaluation pipelines. AlphaApollo also offers flexible and extensible agentic environments and tool-set configurations, allowing users to easily customize, extend, and scale agentic reasoning workflows.
Key Features

Agentic Reasoning
Multi-turn agentic reasoning through an iterative cycle of model reasoning, tool execution, and environment feedback.

Agentic Learning
Stable agentic learning via turn-level optimization that decouples model generations and environmental feedback.

Agentic Evolution
Multi-round agentic evolution through a propose-judge-update evolutionary loop with long-term memory.
Quick Start
Installation
conda create -n alphaapollo python==3.12 -yconda activate alphaapollo
git clone https://github.com/tmlr-group/AlphaApollo.gitcd AlphaApollo
bash installation.shDemo Programs
bash examples/generation/run_generation_informal_math_tool.sh