Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning

Abstract

Robot learning requires a considerable amount of data to realize the promise of generalization. However, it can be challenging to actually collect the magnitude of high-quality data necessary for generalization entirely in the real world. Simulation can serve as a source of plentiful data, wherein techniques such as reinforcement learning can obtain broad coverage over states and actions. However, high-fidelity physics simulators are fundamentally misspecified approximations to reality, making direct zero-shot transfer challenging, especially in tasks where precise and forceful manipulation is necessary. This makes real-world fine-tuning of policies pretrained in simulation an attractive approach to robot learning. However, exploring the real-world dynamics with standard RL fine-tuning techniques is to inefficient for many real-world applications. This paper introduces Simulation-Guided Fine-Tuning (SGFT), a general framework which leverages the structure of the simulator to guide exploration, substantially accelerating adaptation to the real-world. We demonstrate our approach across several manipulation tasks in the real world, learning successful policies for problems that are challenging to learn using purely real-world data. We further provide theoretical backing for the paradigm.

Method

How do we solve contact-rich manipulation in situations where sim2real transfer fails?

Problem:

Simulation provides extensive data coverage, but misspecified physics prevents zero-shot transfer
RL fine-tuning with tabula rasa exploration is sample-inefficient because the search space grows exponentially with the time horizon

Main Idea:

We demonstrate theoretically that value functions define an ordering of states which is robust to low-level dynamics gaps
Using a simulation-learned value function (Vsim) for potential-based reward shaping provides dense rewards to guide real-world exploration
SGFT also integrates nicely with Model-Based RL (MBRL) by making short-horizon predictions with a dynamics model and bootstrapping with Vsim to shorten the horizon of the search problem*. This side-steps the core challenge of compounding errors faced by MBRL, allowing us to use MBRL to speed up fine-tuning even more.

*Note: The RL objective is now biased. Theoretical analysis in the paper shows this bias is acceptable.

Real World Experiments

Standard fine-tuning methods typically have a unlearning phase where the policy gets worse before it gets better. Our method makes consistent rapid progress during fine-tuning.

Hammering

Simulation-Trained Policy

Pretrained Policy

Fine-tuned Policy

Insertion

Pushing

*For fine-tuned insertion policy video, we roll in with the pretrained policy to grasp and switch to the fine-tuned insertion policy

Above is a comparison of time to learn each task with our method vs existing baselines using sim2real transfer, RL finetuning, and/or model-based RL. In each case, our method outperforms baselines in sample efficiency by at least 2x!

BibTeX

@inproceedings{yin2025sgft,
  author    = {Yin, Patrick and Westenbroek, Tyler and Bagaria, Simran and Huang, Kevin and Cheng, Ching-An and Kolobov, Andrey and Gupta, Abhishek},
  title     = {Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2025},
}