AlphaApollo

A System for Deep Agentic Reasoning

AlphaApollo is an agentic reasoning framework that integrates multiple models and tools to enable iterative, verifiable, and self-evolving reasoning. It supports a wide range of agentic reasoning paradigms, including tool-integrated reasoning, agentic post-training (multi-turn SFT and reinforcement learning), and agentic self-evolution. AlphaApollo incorporates multiple post-training algorithms such as PPO, GRPO, and DAPO, and provides dataset-backed agentic evaluation pipelines. AlphaApollo also offers flexible and extensible agentic environments and tool-set configurations, allowing users to easily customize, extend, and scale agentic reasoning workflows.

Key Features

Agentic Reasoning

Multi-turn agentic reasoning through an iterative cycle of model reasoning, tool execution, and environment feedback.

Agentic Learning

Stable agentic learning via turn-level optimization that decouples model generations and environmental feedback.

Agentic Evolution

Multi-round agentic evolution through a propose-judge-update evolutionary loop with long-term memory.

Quick Start

Installation

bash

conda create -n alphaapollo python==3.12 -y
conda activate alphaapollo

git clone https://github.com/tmlr-group/AlphaApollo.git
cd AlphaApollo

bash installation.sh

Demo Programs

bash

bash examples/generation/run_generation_informal_math_tool.sh