We propose an efficient reinforcement learning (RL) framework for fast adaptation of pretrained generative policies. Specifically, our proposed methodology - residual flow steering, instantiates an efficient RL technique that quickly adapts a pretrained flow-matching model by steering it jointly by optimizing a policy for selecting both a latent noise distribution and a residual action. Doing so allows policies to perform both local (residual actions) and global exploration (latent noise), data-efficient adaptation. We demonstrate that this technique is effective for dexterous manipulation problems, serving both as a tool to pretrain behaviors in simulation and efficiently finetune them in the real world.
@misc{su2026rfsreinforcementlearningresidual,
title={RFS: Reinforcement learning with Residual flow steering for dexterous manipulation},
author={Entong Su and Tyler Westenbroek and Anusha Nagabandi and Abhishek Gupta},
year={2026},
eprint={2602.01789},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2602.01789},
}