Core Modules
This section explains how the AlphaApollo system is built — the internal architecture, key abstractions, and how components interact at runtime.
Overview
AlphaApollo is organized around four core pillars:
| Module | Description | Key Directory |
|---|---|---|
| Agent System | The environment-driven, multi-turn agentic reasoning loop | alphaapollo/core/environments/ |
| Self-Evolution | Iterative policy-verifier self-improvement at inference time | alphaapollo/core/generation/evolving/ |
| Dataset Pipeline | Data preprocessing scripts for all workflows | alphaapollo/data_preprocess/ |
| Tools | Extensible tool framework for code execution, verification, and RAG | alphaapollo/core/tools/ |
Architecture at a Glance
┌──────────────────────────────────────────────────────┐
│ AlphaApollo │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Tools │ │ Prompts │ │ Memory │ │
│ │ (code, RAG) │ │ (templates, │ │ (simple, │ │
│ │ │ │ formatting) │ │ score, ND) │ │
│ └──────┬──────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └────────────────┼─────────────────┘ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ Environment Loop │ │
│ │ (Gym-style step/ │ │
│ │ reset interface) │ │
│ └───────────┬───────────┘ │
│ │ │
│ ┌───────────┴───────────┐ │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ RL Training │ │ Self-Evolution │ │
│ │ (PPO, GRPO, │ │ (policy-verifier│ │
│ │ DAPO) │ │ loops) │ │
│ └──────────────────┘ └──────────────────┘ │
└──────────────────────────────────────────────────────┘
Related Pages
- Algorithms — Training and inference pipelines
- Configuration — YAML config reference
- Adding a New Tool — Extend the tool framework
- Adding a New Environment — Plug in a new domain
- Adding a New Algorithm — Create a new workflow