How to Guide Your Flow
ICML 2026

How to Guide Your Flow

Few-Step Alignment via Flow Map Reward Guidance

Jerry Y. Huang*, Justin Lin*, Sheel Shah, Kartik Nair, Nicholas M. Boffi
Carnegie Mellon University  ·  *Equal contribution
arXiv PDF Code BibTeX
Flow Map Reward Guidance
Steps:
FLUX unguided FLUX
FMRG trajectory + FMRG 8 steps
unguided guided t = 1.00
An unguided FLUX sample (left) and the same trajectory guided by FMRG (right) toward a human-preference aesthetic reward. Drag the slider to morph between them, or change the step budget above.
TL;DR

Guidance is usually framed as sampling a reward-tilted distribution $\tilde\rho(x)\propto e^{r(x)}\rho(x)$, which requires stochastic, multi-particle dynamics that are expensive even for simple rewards. We instead pose guidance as a deterministic optimal control problem, yielding a hierarchy of algorithms in which the flow map emerges naturally and popular denoiser-based methods like DPS are recovered as the coarsest approximation. The result is Flow Map Reward Guidance (FMRG): highly efficient, training-free alignment that uses the flow map to both integrate and guide the flow along a single trajectory — matching or beating baselines with as few as 3 NFEs, up to a 70× speedup.

Motivation

In generative modeling, we rarely want just any sample — we want to tailor it to a user-specified reward $r$ that captures what we actually care about.

That reward might score, for example, aesthetic quality, agreement with a measurement, physical plausibility, or alignment with human intent. Guidance steers a pre-trained model toward high reward at inference time, with no extra training.

How do we formalize this? The dominant theoretical framework casts guidance as sampling from a reward-tilted distribution:

The standard framing — reward tilting
$\tilde\rho(x) \;\propto\; e^{r(x)}\,\rho(x)$

However, this target proves remarkably difficult to sample from efficiently. Practical methods often rely on many integration steps, stochastic dynamics, and costly test-time search. The framework also sits poorly with modern flow-based generative models, which increasingly favor deterministic samplers requiring few function evaluations, enabled by recent work on flow maps.

Problem 01
Hard to solve
Sampling $\tilde\rho$ is computationally expensive even for simple rewards. Multi-particle schemes like SMC need a high number of particles and integration steps; single-particle shortcuts like DPS rely on heuristic approximations, leaving their output poorly characterized.
Problem 02
A poor fit for modern samplers
Reward tilting is rooted in stochastic processes. But modern flow models — FLUX, Stable Diffusion 3 — are fundamentally deterministic, and flow maps push them to just a few steps. Porting tilt-based alignment here requires simulating stochasticity inside an ODE sampler — fundamentally misaligned with how these samplers run.

We would like guidance native to modern flows: few-step, deterministic, single-trajectory.

Is there a principled framework for guiding flow-based generative models with very few function evaluations?

Contributions

01
Guidance as optimal control
We reframe guidance from reward-tilt sampling to a deterministic optimal control problem — and the flow map emerges in its closed-form solution.
02
A unifying theory
We characterize FMRG analytically and elucidate the role of design choices like early stopping and the Jacobian — yielding a framework that subsumes DPS and prior single-trajectory methods as special cases.
03
Few-step alignment
On FLUX-scale text-to-image models, FMRG matches or surpasses baselines across inverse problems and reward-guided generation with as few as 3 NFEs — at least an order-of-magnitude speedup, up to 70×.

Key idea

Rather than sample the reward tilt, we pose guidance directly as deterministic optimal control: steer a single trajectory to maximize reward while staying close to the base flow.

Standard framing
$\tilde\rho(x)\propto e^{r(x)}\rho(x)$
sample the reward tilt — stochastic, many particles
reframe
Our perspective
$\displaystyle\min_u \int_0^1 \tfrac{\lVert u_t\rVert^2}{2\lambda}\,dt - r(x_1^u)$
steer one trajectory, $\dot x_t^u = b_t(x_t^u)+u_t$
Exact optimal control
$u_t^* = \lambda\,\nabla X_{t,1}^{u^*}(x_t^*)^\top \nabla r\!\big(X_{t,1}^{u^*}(x_t^*)\big)$

The optimal control is characterized in closed form, with the flow map $X_{t,1}^{u^*}$ appearing explicitly. But this is the controlled flow map under the optimally-guided dynamics — exactly what we are trying to construct.

The closed form is circular, and not directly usable at inference.

A tractable approximation

To break this circularity, we study the optimal control in the small-$\lambda$ limit — analytically tractable, and practically meaningful since aggressive guidance tends to promote reward hacking and mode collapse.

In this limit, the HJB equation reduces to a transport equation that we solve exactly. The result depends only on the uncontrolled flow map $X_{t,1}$:

$u_t^J(x) = \lambda\,\nabla X_{t,1}(x)^\top \nabla r\!\big(X_{t,1}(x)\big), \qquad \|u_t^* - u_t^J\| = O(\lambda^2).$

This same expression admits a second, complementary interpretation. For any $\lambda$, it is the optimal greedy correction: the best single-step intervention at the current time, assuming the uncontrolled flow runs at every other step. Two complementary arguments — small-$\lambda$ and greedy — converge on the same signal.

A hierarchy of approximations

This slots into a hierarchy that also subsumes prior work.

Exact-optimalintractable
$u_t^* = \lambda\,\nabla X_{t,1}^{u^*}(x_t^*)^\top \nabla r\!\big(X_{t,1}^{u^*}(x_t^*)\big)$
uncontrolled flow map  (small-$\lambda$ / greedy)
FMRGours
$u_t(x) = \lambda\,\nabla X_{t,1}(x)^\top \nabla r\!\big(X_{t,1}(x)\big)$
substitute one-step denoiser estimate
DPSprior work
$u_t^{\mathrm{DPS}}(x) = \lambda\,\nabla_x r\!\big(\hat x_1(x)\big),\;\;\hat x_1 = \mathbb{E}[x_1\mid x_t]$

DPS and many of its related derivatives (e.g., MPGD, FlowDPS, FlowChef) are best understood as coarse approximations of optimal control — not of the reward tilt.

We've replaced the controlled flow map with the pretrained one — but is this approximation well-behaved? In an analytically tractable Gaussian setting, we can answer exactly.

Characterizing the output

In an analytically tractable Gaussian setting, we can solve for FMRG's terminal distribution exactly — and quantify how its greedy guidance trades off reward against diversity.

Terminal distribution — sweep of $t_{\text{stop}}$
Animated Gaussian terminal distribution showing how early stopping curbs over-optimization

Without early stopping, greedy guidance can over-optimize: the terminal distribution collapses onto the reward maximum, losing diversity. Early stopping — guiding only on an initial window, then letting the uncontrolled flow finish — gives a principled way to tune the tradeoff. The distribution stays diverse while shifting toward high reward.

The analysis extends qualitatively to real rewards, where early stopping consistently improves reward-guided generation.

Flow Map Reward Guidance

For efficient inference, we apply operator splitting: at each step, we integrate the base flow exactly via the flow map, then apply a gradient correction toward reward.

FMRG — one trajectory, vs. multi-particle reward tilt
FMRG alternates exact flow-map steps with reward-gradient steps along a single trajectory (top), in contrast to reward-tilt sampling (e.g. SMC), which needs many particles, resampling, and many steps (bottom).
FMRG — the algorithm
for $k = 0, \ldots, N{-}1$:
$\tilde x_{t_{k+1}} = X_{t_k,\,t_{k+1}}(x_{t_k})$flow-map step
$x_{t_{k+1}} = \tilde x_{t_{k+1}} + \Delta t_k\,\lambda\, u(\tilde x_{t_{k+1}})$guidance step
return $x_N$
Notably simple — each step is one flow-map call plus one gradient correction.

In practice, several design choices shape FMRG, such as the guidance strength $\lambda$ and the number of gradient steps per interval. We turn next to the choice of gradient, which has a particularly significant role.

The role of the flow map Jacobian

Manifold projection: Jacobian keeps FMRG-J on-manifold; Euclidean drifts off-manifold Empirical comparison: FMRG-J preserves data features; FMRG-E achieves higher reward but introduces off-manifold artifacts
Left: the Jacobian projects $\nabla r$ onto the manifold tangent space, keeping FMRG-J on-manifold; FMRG-E follows $\nabla r$ off-manifold. Right: FMRG-J preserves data features; FMRG-E achieves higher reward but introduces off-manifold artifacts.

Geometrically, the flow map Jacobian $\nabla X_{t,1}(x)^\top$ acts as a projection. We prove that it maps the reward gradient onto the data-manifold tangent space, so each guided step stays on-manifold by construction.

For complex reward landscapes, where neural-network gradients often point far off the data manifold, this projection annihilates the off-manifold component and keeps FMRG-J on-manifold.

FMRG-E drops the Jacobian. It avoids backpropagation through the flow map, saving memory and optimizing the reward directly, but the trajectory can drift off-manifold. Prior single-trajectory methods (FlowDPS, FlowChef, MPGD) similarly drop the Jacobian, which effectively amounts to a rescaling of the guidance weight.

Results

A single FLUX flow map, evaluated across four reward families — from simple $\ell_2$ reconstruction to a 7B vision-language model.

Inverse problems
FMRG outperforms DPS, FlowChef, and FlowDPS on super-resolution, deblurring, and inpainting — across both AFHQ and FFHQ. FMRG-E matches baselines that use 2–10× more NFEs.
Reward-guided generation
FMRG dominates the GenEval–NFE Pareto frontier across budgets, reaching reward-tilt baseline quality with up to 70× fewer NFEs.
Style transfer
FMRG transfers the reference style while preserving content; denoiser-based approaches like DPS miss the style or show artifacts — exactly as the approximation hierarchy predicts.
VLM rewards
With a 7B vision-language reward, FMRG follows complex compositional prompts — object attributes, spatial relations, text — that unguided FLUX consistently misses.
Qualitative results — browse by reward family
Eye with Earth in pupil
“Close up of an eye with the Earth inside the pupil.”
Sumie garden at dawn
“A Japanese garden at dawn, ink illustration, minimal lines, sumi-e style, misty atmosphere.”
Infinite library
“Infinite library stretching beyond horizon.”
Giant moon on the ocean
“Giant moon resting on the ocean, glowing softly, dreamlike.”
Girl under floating lanterns
“Girl under floating lanterns, warm gold light, nostalgic haze, cinematic softness.”
Woman portrait, oil painting
“Portrait of a woman, oil painting, chiaroscuro lighting, deep shadows, renaissance style, rich brushstrokes.”
Ocean suspended in the sky
“Ocean suspended in the sky, clouds below.”
Lonely foggy landscape
“Lonely figure in a vast foggy landscape, muted earth tones, melancholic atmosphere, oil painting.”
Forest watercolor landscape
“Forest landscape, watercolor wash, soft edges, light bleeding into paper, autumn colors.”
Jazz musician with saxophone
“A jazz musician playing saxophone, smoky bar, low key lighting, warm amber tones, candid moment.”
Style hierarchy
Reference → FLUX → DPS → FlowChef → FMRG-E → FMRG-J. FMRG captures the target style most faithfully.
Style hierarchy variant 3
Style hierarchy — second reference.
Style hierarchy variant 4
Style hierarchy — third reference.
Style hierarchy variant 5
Style hierarchy — fourth reference.
Super-resolution measurementMeasurement
Super-resolution, FMRG at 3 NFEsFMRG · 3 NFEs
Super-resolution, FMRG at 12 NFEsFMRG · 12 NFEs
4× super-resolution (AFHQ) — FMRG recovers sharp detail from just 3 NFEs.
Motion deblur measurementMeasurement
Motion deblur, FMRG at 3 NFEsFMRG · 3 NFEs
Motion deblur, FMRG at 12 NFEsFMRG · 12 NFEs
Motion deblurring (AFHQ) — FMRG removes the blur from just 3 NFEs.
Inpainting measurementMeasurement
Inpainting, FMRG at 3 NFEsFMRG · 3 NFEs
Inpainting, FMRG at 12 NFEsFMRG · 12 NFEs
Box inpainting (FFHQ) — FMRG fills the masked region from just 3 NFEs.
VLM raccoon
“A cool raccoon in mirrored sunglasses, with a neon pizza sign reflected in the lenses.”
VLM stop sign
“A stop sign that says FMRG!”
VLM candles in brass holders
“Two tall white candles in matching brass candlestick holders, the left has a bright flame, the right is extinguished.”
VLM box books
“A cardboard moving box labeled books in thick black marker.”
VLM clock TV
“A photo of an analog clock below a TV.”
VLM tile grid
“A 3×3 grid of square ceramic tiles on a clean wall, exactly one tile missing in the center.”
Quantitative results — tables & plots
LPIPS and FID vs NFE on AFHQ inverse problems
NFE–performance trade-off (AFHQ)FMRG-E achieves notably better performance in the low NFE regime (up to 10× reduction in NFE).
LPIPS and FID vs NFE on FFHQ inverse problems
NFE–performance trade-off (FFHQ)FMRG-E achieves notably better performance in the low NFE regime (up to 10× reduction in NFE).
MethodSuper-ResolutionMotion DeblurInpainting
PSNR↑SSIM↑LPIPS↓FID↓PSNR↑SSIM↑LPIPS↓FID↓PSNR↑SSIM↑LPIPS↓FID↓
AFHQDPS18.06.443.50359.9916.68.407.54758.5917.15.512.54095.30
FlowChef26.87.767.24343.2926.75.749.25038.5824.12.828.16035.10
FlowDPS27.02.778.25034.5826.07.739.27936.7726.11.798.23944.10
FMRG-E27.12.772.18025.4827.26.771.17723.1226.12.851.12624.73
FMRG-J27.39.795.19329.5127.36.788.20424.4326.71.842.15831.20
FFHQDPS18.74.570.530119.3020.20.618.483127.9118.93.623.486137.36
FlowChef26.71.782.237117.0424.53.713.296109.3025.41.830.16476.52
FlowDPS27.70.818.20561.8526.85.789.23364.5527.90.844.19573.30
FMRG-E27.53.799.17162.6328.10.807.15338.5228.66.883.10335.15
FMRG-J28.23.832.15455.9928.62.834.15541.3329.48.893.11243.99
Latent-space inverse problemsTuned to each method's best hyperparameters, FMRG outperforms DPS, FlowChef, and FlowDPS across the board — super-resolution, motion deblur, and inpainting, on both AFHQ and FFHQ. Bold is best, underline second.
GenEval accuracy vs NFE Pareto frontier
GenEval accuracy vs. NFEFMRG-J dominates the Pareto frontier across all NFE budgets — matching FMTT's quality (0.77) at NFE 20, a 70× reduction in compute.
MethodOverall↑SingleTwoCountColorsPositionColor Attr.NFE↓
FLUX0.662.991.795.697.801.212.47550
Flow Map0.668.975.848.637.777.210.5608
Flow Map + Best-of-N0.7581.00.909.838.872.260.670128
ReNO0.7161.00.881.769.875.172.60858
FMTT0.771.988.929.850.862.310.6901400
FMRG-E0.7661.00.927.828.856.297.685100
FMRG-J0.770.997.922.863.854.295.69020
FMRG-J0.8001.00.947.884.902.292.772100
GenEval accuracyOn a shared FLUX flow-map backbone, FMRG attains the best GenEval score overall (0.80) — and the best score at every fixed NFE budget.
1 / 5

BibTeX

@article{huang2026fmrg,
  title={How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance},
  author={Huang, Jerry Y. and Lin, Justin and Shah, Sheel and Nair, Kartik and Boffi, Nicholas M.},
  journal={arXiv preprint},
  year={2026}
}