OmniReset
Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning
Patrick Yin1*, Tyler Westenbroek1*, Octi Zhang2, Joshua Tran1, Ignacio Dagnino1,
Eeshani Shilamkar1, Numfor Mbiziwo-Tiapo1, Simran Bagaria3, Xinlei Liu1,
Galen Mullins3, Andrey Kolobov3, Abhishek Gupta1
*Equal contribution 1University of Washington, 2NVIDIA, 3Microsoft Research
ICLR 2026
Try It Yourself! Interactive

Loading simulation...

Press ‘Play’ to start a new rollout.

Press ‘Disturb’ to open the gripper and apply a force perturbation to the object.
Now drag and scroll to rotate and zoom.
Press ‘Reset’ to start a new rollout.
Switch tasks using the tabs above!

Interactive demo by Joshua Tran

The OmniReset Pipeline

OmniReset overcomes the exploration bottleneck by automatically generating diverse reset distributions and scaling up RL training. No demos, no reward shaping, just RL. The resulting policies are distilled to RGB and transferred to the real world zero-shot.

Step 1 — Generate Diverse Resets

Automatically generate diverse reset states for PPO.

Step 2 — Large-Scale State-Based RL Training

Scale PPO to 64K+ environments.

Step 3 — Distill to Perception

Distill to RGB with extensive visual randomizations.

Step 4 — Deploy

Zero-shot sim2real transfer from RGB.

Emergent Dexterity

Robustness and Emergent Retrying Behavior

OmniReset policies are robust to perturbations and succeed across the entire workspace. OmniReset learns robust policies over ranges of initial conditions >300x wider than baselines (see comparisons below).

Robustness to perturbations

Difficult initial conditions at the edges of the workspace

Non-Prehensile Behaviors

OmniReset discovers non-prehensile skills that exploit environment dynamics. Watch the robot reorient the peg using the hole!

Diverse Long-Horizon Stitching

OmniReset stitches together diverse, emergent skills into long-horizon behaviors, with no task-specific priors.

avoid obstacle → flip → push → wiggle in

reach → flip → insert

reach → pick → insert → release → twist

Digging into the Reset Distributions

OmniReset aims to cover all contact-rich states the robot might encounter and all potential paths to the goal. Large-scale RL (64K+ environments) then sorts through these options to find successful behaviors for each task.

The user only specifies the pose of the object at success. From that, we automatically generate the following distributions to cover all reasonable robot-object configurations.

Near Goal

The object near the goal with the gripper close to the object.

Grasped

The robot gripper grasping the object.

Near Object

The robot gripper close to the object.

Reaching

The robot gripper randomly positioned.

Our reset distributions zoomed in on leg to show diversity of object-robot configurations.

Near Goal

Grasped

Visualizing Learning Over Time

OmniReset requires no curricula or reward shaping, just diverse resets and large-scale compute. PPO naturally learns backwards from near-goal states to the full workspace.

Scalable Simulation Training

OmniReset scales gracefully to a large number of tasks.

Leg Twisting

Screw the leg into the table.

Drawer Assembly

Insert the drawer into the drawer box.

Peg Insertion

Insert the peg in the hole.

Rectangle on Wall

Reorient the rectangular block to the target position on the wall.

Cube Stacking

Stack the cubes on top of each other.

Birthday Party

Pick up the cupcake and place it on the plate.

Leg Twisting (×4)

Demo only. Runs the leg twisting policy four times from a fixed initial state (no large distribution coverage).

Sim2Real Transfer

OmniReset policies distilled to RGB with zero-shot sim2real transfer.

Simulation
Real World
Simulation
Real World
Simulation
Real World
Baseline Comparisons

We evaluate on easy (restricted initial conditions) and hard (full workspace) variants of each task. Baselines rarely succeed on hard settings. The scatter plot shows successful initial conditions for OmniReset (left) vs. the strongest baseline, Demo Curriculum (right).

Baselines scatter plot

Here we plot learning curves for different methods across different variations of the task. OmniReset consistently achieves high success rates where baselines struggle to make progress in learning.

Baselines 3x3 grid
Real-World Evaluations

The three stacked views on the right are policy inputs; the main view is for visualization only.

The most impressive moments from our evaluations, supercut.

Peg insertion

Table assembly

Drawer assembly

Policies recover from perturbations and retry after mistakes.

Peg Insertion

Table Assembly

Policies succeed under randomized target positions.

Peg Insertion

Table Assembly

Policy failures supercut from evaluations.

Peg insertion

Table assembly

Drawer assembly

Full, uncut, continuous evaluations.

Peg insertion

Table assembly

Drawer assembly

BibTeX
Reference
@inproceedings{yin2026emergent,
  title={Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning},
  author={Patrick Yin and Tyler Westenbroek and Zhengyu Zhang and Joshua Tran and Ignacio Dagnino and Eeshani Shilamkar and Numfor Mbiziwo-Tiapo and Simran Bagaria and Xinlei Liu and Galen Mullins and Andrey Kolobov and Abhishek Gupta},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://arxiv.org/abs/2603.15789},
}