📄️ Algorithms
Overview of AlphaApollo's training algorithms — SFT, RL training (PPO/GRPO), and the Evolving Pipeline.
📄️ RL Training
Reinforcement learning algorithms available in AlphaApollo — PPO, GRPO, DAPO, and RLOO — with training architecture and configuration examples.
📄️ Supervised Fine-Tuning
SFT pipeline in AlphaApollo — config reference, LoRA support, multi-turn data format, and the SFT-to-RL handoff.
📄️ Evolving Pipeline
AlphaApollo's Evolving Pipeline — policy-verifier loops, solution memory, and single- or multi-model K-branch setups for inference-time improvement.