Skip to main content

Multi-round Agentic Evolution

Multi-round Agentic Evolution is AlphaApollo's core mechanism for enabling multiple models to collaboratively evolve solutions through parallel reasoning and shared state management. This approach addresses the challenge of test-time iteration by providing trustworthy feedback and enabling systematic solution refinement.

Multi-round Agentic Evolution Framework

Illustration of multi-round agentic evolution in AlphaApollo. The model iteratively refines its strategies through a propose-judge-update evolutionary loop. A long-term memory is introduced to prevent future errors and promote efficient strategies in subsequent rounds.

Long-term Memory

At the heart of multi-round evolution is a long-term memory that records:

  • Candidate solutions: All proposed solutions from different models
  • Feedback signals: Fine-grained evaluation outcomes
  • Evolution history: How solutions have been refined over rounds

How Multi-Round Evolution Works

1. Parallel Candidate Generation

Multiple models operate in parallel, each with full access to the toolset:

  • Each model proposes candidate solutions independently
  • Models can use computation and retrieval tools as needed
  • All candidates are recorded in the shared memory

2. Tool-Based Evaluation

Each candidate solution undergoes verification:

  • Code solutions are executed to produce verifiable results
  • Mathematical derivations are checked for correctness
  • Logical consistency is validated
  • Evaluation results are stored in the memory

3. Refinement

Models reference the shared memory to generate improved solutions:

  • Review successful patterns from previous rounds
  • Learn from failed attempts and their feedback
  • Combine insights from multiple candidate solutions
  • Generate refined solutions that build upon prior candidates

Previous: Multi-turn Agentic Learning |Back to main page