Trqam

Released Trust Region Q Adjoint Matching (TRQAM), a stable off-policy RL algorithm for pretrained flow policies. arXiv · blog · code