Skip to main content

Generation Config

This page documents the generation.yaml configuration used by verl.trainer.main_generation. The generation pipeline runs offline inference on a given dataset. It can produce standalone model responses or interact with environments for multi-turn tasks such as tool-augmented math reasoning.

Overview

The generation pipeline is used to:

  1. Collect training data — Generate solutions for problems that can be used for SFT or further analysis
  2. Evaluate models — Produce responses for benchmarks and compute metrics
  3. Multi-turn inference — Interact with environments (e.g., informal math with Python code execution)

Launch with:

python3 -m verl.trainer.main_generation \
model.path=Qwen/Qwen2.5-1.5B-Instruct \
data.path=~/data/test.parquet \
rollout.temperature=0.6 \
...

Config Structure

trainer:
nnodes: 1
n_gpus_per_node: 8

data:
path: ~/data/rlhf/math/test.parquet
prompt_key: prompt
n_samples: 5
output_path: /opt/tiger/math_output.parquet
batch_size: 128
return_raw_chat: True
max_prompt_length: ${rollout.prompt_length}
max_response_length: ${rollout.response_length}
truncation: error
save2json: False
json_output_path: ???

model:
path: ~/models/Qwen2-7B-Instruct
external_lib: null

rollout:
name: vllm
mode: sync
temperature: 1.0
top_k: -1
top_p: 0.7
prompt_length: 1536
response_length: 512
dtype: bfloat16
gpu_memory_utilization: 0.5
tensor_model_parallel_size: 1
max_num_batched_tokens: 8192
max_num_seqs: 1024
n: 1
enforce_eager: True
free_cache_engine: True
load_format: dummy_dtensor
enable_chunked_prefill: True

env:
env_name: informal_math_training
seed: 0
max_steps: 1
history_length: 2
...

Data

ParameterTypeDefaultDescription
pathstrPath to input dataset (parquet format).
prompt_keystrpromptColumn name for prompts in the dataset.
n_samplesint5Number of responses to generate per prompt.
output_pathstrPath to save output parquet file.
batch_sizeint128Batch size for generation. Also sets train_batch_size and val_batch_size.
return_raw_chatboolTrueReturn raw chat without applying template. Useful for multi-turn environments.
max_prompt_lengthintMaximum prompt length (defaults to rollout.prompt_length).
max_response_lengthintMaximum response length (defaults to rollout.response_length).
truncationstrerrorTruncation strategy: error, left, right, middle.
save2jsonboolFalseAlso save outputs in JSON format (in addition to parquet).
json_output_pathstrPath for JSON output (required when save2json=True).

Model

ParameterTypeDescription
pathstrHuggingFace model path or local checkpoint path.
external_libstrAdditional Python packages to import for model registration.

Rollout

The rollout section controls the inference engine and sampling parameters.

ParameterTypeDefaultDescription
namestrvllmInference engine: vllm, sglang, or hf.
modestrsyncsync for standard LLM, async for AsyncLLM.
temperaturefloat1.0Sampling temperature. Higher = more random.
top_kint-1Top-k sampling. -1 for vLLM (disabled), 0 for HF (disabled).
top_pfloat0.7Nucleus sampling threshold.
prompt_lengthint1536Maximum prompt token length.
response_lengthint512Maximum response token length.
dtypestrbfloat16Model precision. Should align with training precision.
gpu_memory_utilizationfloat0.5Fraction of GPU memory for vLLM. Increase for larger models.
tensor_model_parallel_sizeint1Tensor parallelism degree. Increase for larger models.
max_num_batched_tokensint8192Maximum batched tokens for vLLM scheduler.
nint1Number of responses per prompt per batch.
enable_chunked_prefillboolTrueEnable chunked prefill for better throughput.

Environment

The generation pipeline supports environment interaction for multi-turn tasks.

env:
env_name: informal_math_training
seed: 0
max_steps: 8
history_length: 8
resources_per_worker:
num_cpus: 0.1
num_gpus: 0
informal_math:
memory_type: simple
enable_python_code: true
enable_local_rag: true
python_code_timeout: 30
ParameterTypeDescription
env_namestrEnvironment to use. Set to the desired environment name.
max_stepsintMaximum interaction steps. For single-turn generation, set to 1.
history_lengthintNumber of past turns included in observations.
informal_math.enable_python_codeboolEnable Python code execution tool.
informal_math.enable_local_ragboolEnable local RAG retrieval tool.

Example: Informal Math Generation

python3 -m verl.trainer.main_generation \
trainer.nnodes=1 \
trainer.n_gpus_per_node=1 \
data.path=~/data/HuggingFaceH4/MATH-500/test.parquet \
data.prompt_key=prompt \
data.n_samples=2 \
data.batch_size=32 \
data.return_raw_chat=True \
data.output_path=~/data/output.parquet \
data.save2json=true \
data.json_output_path=~/data/output.json \
model.path=Qwen/Qwen2.5-1.5B-Instruct \
rollout.temperature=0.6 \
rollout.top_k=20 \
rollout.top_p=0.95 \
rollout.prompt_length=2048 \
rollout.response_length=8192 \
rollout.tensor_model_parallel_size=1 \
rollout.gpu_memory_utilization=0.75 \
rollout.name=vllm \
env.env_name=informal_math_training \
env.max_steps=8 \
env.history_length=8 \
env.informal_math.enable_python_code=true \
env.informal_math.enable_local_rag=true

Example: Single-Turn Generation (No Tool)

For generation without environment interaction:

python3 -m verl.trainer.main_generation \
trainer.n_gpus_per_node=1 \
data.path=~/data/test.parquet \
data.n_samples=5 \
model.path=Qwen/Qwen2.5-1.5B-Instruct \
rollout.temperature=0.7 \
rollout.response_length=4096 \
env.max_steps=1
Single-turn generation

Set env.max_steps=1 to disable environment interaction and run single-turn generation only.

Differences from RL Training Config

AspectGenerationRL Training
Entry pointverl.trainer.main_generationverl.trainer.main_ppo
Actor trainingNoYes
Critic modelNoYes (PPO only)
Reward computationNoYes
Environment interactionOptionalYes
OutputParquet / JSON fileModel checkpoints