Algorithms | AlphaApollo

📄️ Algorithms

Overview of AlphaApollo's training algorithms — SFT, RL training (PPO/GRPO), and the Evolving Pipeline.

Reinforcement learning algorithms available in AlphaApollo — PPO, GRPO, DAPO, and RLOO — with training architecture and configuration examples.

SFT pipeline in AlphaApollo — config reference, LoRA support, multi-turn data format, and the SFT-to-RL handoff.

AlphaApollo's Evolving Pipeline — policy-verifier loops, solution memory, and single- or multi-model K-branch setups for inference-time improvement.