How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

TL;DR

Guidance is usually framed as sampling a reward-tilted distribution $\tilde\rho(x)\propto e^{r(x)}\rho(x)$, which requires stochastic, multi-particle dynamics that are expensive even for simple rewards. We instead pose guidance as a deterministic optimal control problem, yielding a hierarchy of algorithms in which the flow map emerges naturally and popular denoiser-based methods like DPS are recovered as the coarsest approximation. The result is Flow Map Reward Guidance (FMRG): highly efficient, training-free alignment that uses the flow map to both integrate and guide the flow along a single trajectory — matching or beating baselines with as few as 3 NFEs, up to a 70× speedup.

Motivation

In generative modeling, we rarely want just any sample — we want to tailor it to a user-specified reward $r$ that captures what we actually care about.

That reward might score, for example, aesthetic quality, agreement with a measurement, physical plausibility, or alignment with human intent. Guidance steers a pre-trained model toward high reward at inference time, with no extra training.

How do we formalize this? The dominant theoretical framework casts guidance as sampling from a reward-tilted distribution:

The standard framing — reward tilting

$\tilde\rho(x) \;\propto\; e^{r(x)}\,\rho(x)$

However, this target proves remarkably difficult to sample from efficiently. Practical methods often rely on many integration steps, stochastic dynamics, and costly test-time search. The framework also sits poorly with modern flow-based generative models, which increasingly favor deterministic samplers requiring few function evaluations, enabled by recent work on flow maps.

Problem 01

Hard to solve

Sampling $\tilde\rho$ is computationally expensive even for simple rewards. Multi-particle schemes like SMC need a high number of particles and integration steps; single-particle shortcuts like DPS rely on heuristic approximations, leaving their output poorly characterized.

Problem 02

A poor fit for modern samplers

Reward tilting is rooted in stochastic processes. But modern flow models — FLUX, Stable Diffusion 3 — are fundamentally deterministic, and flow maps push them to just a few steps. Porting tilt-based alignment here requires simulating stochasticity inside an ODE sampler — fundamentally misaligned with how these samplers run.

We would like guidance native to modern flows: few-step, deterministic, single-trajectory.

Is there a principled framework for guiding flow-based generative models with very few function evaluations?

Key idea

Rather than sample the reward tilt, we pose guidance directly as deterministic optimal control: steer a single trajectory to maximize reward while staying close to the base flow.

Standard framing

$\tilde\rho(x)\propto e^{r(x)}\rho(x)$

sample the reward tilt — stochastic, many particles

→reframe

Our perspective

$\displaystyle\min_u \int_0^1 \tfrac{\lVert u_t\rVert^2}{2\lambda}\,dt - r(x_1^u)$

steer one trajectory, $\dot x_t^u = b_t(x_t^u)+u_t$

Exact optimal control

$u_t^* = \lambda\,\nabla X_{t,1}^{u^*}(x_t^*)^\top \nabla r\!\big(X_{t,1}^{u^*}(x_t^*)\big)$

The optimal control is characterized in closed form, with the flow map $X_{t,1}^{u^*}$ appearing explicitly. But this is the controlled flow map under the optimally-guided dynamics — exactly what we are trying to construct.

The closed form is circular, and not directly usable at inference.

A tractable approximation

To break this circularity, we study the optimal control in the small-$\lambda$ limit — analytically tractable, and practically meaningful since aggressive guidance tends to promote reward hacking and mode collapse.

In this limit, the HJB equation reduces to a transport equation that we solve exactly. The result depends only on the uncontrolled flow map $X_{t,1}$:

$u_t^J(x) = \lambda\,\nabla X_{t,1}(x)^\top \nabla r\!\big(X_{t,1}(x)\big), \qquad \|u_t^* - u_t^J\| = O(\lambda^2).$

This same expression admits a second, complementary interpretation. For any $\lambda$, it is the optimal greedy correction: the best single-step intervention at the current time, assuming the uncontrolled flow runs at every other step. Two complementary arguments — small-$\lambda$ and greedy — converge on the same signal.

A hierarchy of approximations

This slots into a hierarchy that also subsumes prior work.

Exact-optimalintractable

$u_t^* = \lambda\,\nabla X_{t,1}^{u^*}(x_t^*)^\top \nabla r\!\big(X_{t,1}^{u^*}(x_t^*)\big)$

↓uncontrolled flow map (small-$\lambda$ / greedy)

FMRGours

$u_t(x) = \lambda\,\nabla X_{t,1}(x)^\top \nabla r\!\big(X_{t,1}(x)\big)$

↓substitute one-step denoiser estimate

DPSprior work

$u_t^{\mathrm{DPS}}(x) = \lambda\,\nabla_x r\!\big(\hat x_1(x)\big),\;\;\hat x_1 = \mathbb{E}[x_1\mid x_t]$

DPS and many of its related derivatives (e.g., MPGD, FlowDPS, FlowChef) are best understood as coarse approximations of optimal control — not of the reward tilt.

We've replaced the controlled flow map with the pretrained one — but is this approximation well-behaved? In an analytically tractable Gaussian setting, we can answer exactly.

Flow Map Reward Guidance

For efficient inference, we apply operator splitting: at each step, we integrate the base flow exactly via the flow map, then apply a gradient correction toward reward.

FMRG — one trajectory, vs. multi-particle reward tilt

FMRG alternates exact flow-map steps with reward-gradient steps along a single trajectory (top), in contrast to reward-tilt sampling (e.g. SMC), which needs many particles, resampling, and many steps (bottom).

FMRG — the algorithm

for $k = 0, \ldots, N{-}1$:

$\tilde x_{t_{k+1}} = X_{t_k,\,t_{k+1}}(x_{t_k})$flow-map step

$x_{t_{k+1}} = \tilde x_{t_{k+1}} + \Delta t_k\,\lambda\, u(\tilde x_{t_{k+1}})$guidance step

return $x_N$

Notably simple — each step is one flow-map call plus one gradient correction.

In practice, several design choices shape FMRG, such as the guidance strength $\lambda$ and the number of gradient steps per interval. We turn next to the choice of gradient, which has a particularly significant role.

The role of the flow map Jacobian

Manifold projection: Jacobian keeps FMRG-J on-manifold; Euclidean drifts off-manifold

Empirical comparison: FMRG-J preserves data features; FMRG-E achieves higher reward but introduces off-manifold artifacts

Left: the Jacobian projects $\nabla r$ onto the manifold tangent space, keeping FMRG-J on-manifold; FMRG-E follows $\nabla r$ off-manifold. Right: FMRG-J preserves data features; FMRG-E achieves higher reward but introduces off-manifold artifacts.

Geometrically, the flow map Jacobian $\nabla X_{t,1}(x)^\top$ acts as a projection. We prove that it maps the reward gradient onto the data-manifold tangent space, so each guided step stays on-manifold by construction.

For complex reward landscapes, where neural-network gradients often point far off the data manifold, this projection annihilates the off-manifold component and keeps FMRG-J on-manifold.

FMRG-E drops the Jacobian. It avoids backpropagation through the flow map, saving memory and optimizing the reward directly, but the trajectory can drift off-manifold. Prior single-trajectory methods (FlowDPS, FlowChef, MPGD) similarly drop the Jacobian, which effectively amounts to a rescaling of the guidance weight.

Results

A single FLUX flow map, evaluated across four reward families — from simple $\ell_2$ reconstruction to a 7B vision-language model.

Inverse problems

FMRG outperforms DPS, FlowChef, and FlowDPS on super-resolution, deblurring, and inpainting — across both AFHQ and FFHQ. FMRG-E matches baselines that use 2–10× more NFEs.

Reward-guided generation

FMRG dominates the GenEval–NFE Pareto frontier across budgets, reaching reward-tilt baseline quality with up to 70× fewer NFEs.

Style transfer

FMRG transfers the reference style while preserving content; denoiser-based approaches like DPS miss the style or show artifacts — exactly as the approximation hierarchy predicts.

VLM rewards

With a 7B vision-language reward, FMRG follows complex compositional prompts — object attributes, spatial relations, text — that unguided FLUX consistently misses.

Qualitative results — browse by reward family

“Close up of an eye with the Earth inside the pupil.”

“A Japanese garden at dawn, ink illustration, minimal lines, sumi-e style, misty atmosphere.”

“Infinite library stretching beyond horizon.”

“Giant moon resting on the ocean, glowing softly, dreamlike.”

“Girl under floating lanterns, warm gold light, nostalgic haze, cinematic softness.”

“Portrait of a woman, oil painting, chiaroscuro lighting, deep shadows, renaissance style, rich brushstrokes.”

“Ocean suspended in the sky, clouds below.”

“Lonely figure in a vast foggy landscape, muted earth tones, melancholic atmosphere, oil painting.”

“Forest landscape, watercolor wash, soft edges, light bleeding into paper, autumn colors.”

“A jazz musician playing saxophone, smoky bar, low key lighting, warm amber tones, candid moment.”

Reference → FLUX → DPS → FlowChef → FMRG-E → FMRG-J. FMRG captures the target style most faithfully.

Style hierarchy — second reference.

Style hierarchy — third reference.

Style hierarchy — fourth reference.

Measurement

FMRG · 3 NFEs

FMRG · 12 NFEs

4× super-resolution (AFHQ) — FMRG recovers sharp detail from just 3 NFEs.

Measurement

FMRG · 3 NFEs

FMRG · 12 NFEs

Motion deblurring (AFHQ) — FMRG removes the blur from just 3 NFEs.

Measurement

FMRG · 3 NFEs

FMRG · 12 NFEs

Box inpainting (FFHQ) — FMRG fills the masked region from just 3 NFEs.

“A cool raccoon in mirrored sunglasses, with a neon pizza sign reflected in the lenses.”

“A stop sign that says FMRG!”

“Two tall white candles in matching brass candlestick holders, the left has a bright flame, the right is extinguished.”

“A cardboard moving box labeled books in thick black marker.”

“A photo of an analog clock below a TV.”

“A 3×3 grid of square ceramic tiles on a clean wall, exactly one tile missing in the center.”

Quantitative results — tables & plots

LPIPS and FID vs NFE on AFHQ inverse problems

NFE–performance trade-off (AFHQ)FMRG-E achieves notably better performance in the low NFE regime (up to 10× reduction in NFE).

LPIPS and FID vs NFE on FFHQ inverse problems

NFE–performance trade-off (FFHQ)FMRG-E achieves notably better performance in the low NFE regime (up to 10× reduction in NFE).

	Method	Super-Resolution				Motion Deblur				Inpainting
	Method	PSNR↑	SSIM↑	LPIPS↓	FID↓	PSNR↑	SSIM↑	LPIPS↓	FID↓	PSNR↑	SSIM↑	LPIPS↓	FID↓
AFHQ	DPS	18.06	.443	.503	59.99	16.68	.407	.547	58.59	17.15	.512	.540	95.30
	FlowChef	26.87	.767	.243	43.29	26.75	.749	.250	38.58	24.12	.828	.160	35.10
	FlowDPS	27.02	.778	.250	34.58	26.07	.739	.279	36.77	26.11	.798	.239	44.10
	FMRG-E	27.12	.772	.180	25.48	27.26	.771	.177	23.12	26.12	.851	.126	24.73
	FMRG-J	27.39	.795	.193	29.51	27.36	.788	.204	24.43	26.71	.842	.158	31.20
FFHQ	DPS	18.74	.570	.530	119.30	20.20	.618	.483	127.91	18.93	.623	.486	137.36
	FlowChef	26.71	.782	.237	117.04	24.53	.713	.296	109.30	25.41	.830	.164	76.52
	FlowDPS	27.70	.818	.205	61.85	26.85	.789	.233	64.55	27.90	.844	.195	73.30
	FMRG-E	27.53	.799	.171	62.63	28.10	.807	.153	38.52	28.66	.883	.103	35.15
	FMRG-J	28.23	.832	.154	55.99	28.62	.834	.155	41.33	29.48	.893	.112	43.99

Latent-space inverse problemsTuned to each method's best hyperparameters, FMRG outperforms DPS, FlowChef, and FlowDPS across the board — super-resolution, motion deblur, and inpainting, on both AFHQ and FFHQ. Bold is best, underline second.

GenEval accuracy vs. NFEFMRG-J dominates the Pareto frontier across all NFE budgets — matching FMTT's quality (0.77) at NFE 20, a 70× reduction in compute.

Method	Overall↑	Single	Two	Count	Colors	Position	Color Attr.	NFE↓
FLUX	0.662	.991	.795	.697	.801	.212	.475	50
Flow Map	0.668	.975	.848	.637	.777	.210	.560	8
Flow Map + Best-of-N	0.758	1.00	.909	.838	.872	.260	.670	128
ReNO	0.716	1.00	.881	.769	.875	.172	.608	58
FMTT	0.771	.988	.929	.850	.862	.310	.690	1400
FMRG-E	0.766	1.00	.927	.828	.856	.297	.685	100
FMRG-J	0.770	.997	.922	.863	.854	.295	.690	20
FMRG-J	0.800	1.00	.947	.884	.902	.292	.772	100

GenEval accuracyOn a shared FLUX flow-map backbone, FMRG attains the best GenEval score overall (0.80) — and the best score at every fixed NFE budget.

1 / 5

How to Guide Your Flow

Motivation

Contributions

Key idea

A tractable approximation

A hierarchy of approximations

Characterizing the output

Flow Map Reward Guidance

The role of the flow map Jacobian

Results

BibTeX