blog | Yonghoon Dong

Trust Region Q-Adjoint Matching: Stable Off-Policy RL for Flow Policies

A new stable off-policy fine-tuning algorithm for pretrained flow-based policies, combining trust-region principles with stochastic optimal control.

26 min read · May 25, 2026

2026 · off-policy flow-matching trust-region robotics · rl