OmniReset

Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

Patrick Yin^1*, Tyler Westenbroek^1*, Octi Zhang², Joshua Tran¹, Ignacio Dagnino¹,
Eeshani Shilamkar¹, Numfor Mbiziwo-Tiapo¹, Simran Bagaria³, Xinlei Liu¹,
Galen Mullins³, Andrey Kolobov³, Abhishek Gupta¹
*Equal contribution ¹University of Washington, ²NVIDIA, ³Microsoft Research

Read the paper Code

ICLR 2026

Try It Yourself! Interactive

Loading simulation...

Press ‘Play’ to start a new rollout.

Press ‘Disturb’ to open the gripper and apply a force perturbation to the object.

Now drag and scroll to rotate and zoom.

Press ‘Reset’ to start a new rollout.

Switch tasks using the tabs above!

Interactive demo by Joshua Tran

The OmniReset Pipeline

OmniReset overcomes the exploration bottleneck by automatically generating diverse reset distributions and scaling up RL training. No demos, no reward shaping, just RL. The resulting policies are distilled to RGB and transferred to the real world zero-shot.

Step 1 — Generate Diverse Resets

Automatically generate diverse reset states for PPO.

Step 2 — Large-Scale State-Based RL Training

Scale PPO to 64K+ environments.

Step 3 — Distill to Perception

Distill to RGB with extensive visual randomizations.

Step 4 — Deploy

Zero-shot sim2real transfer from RGB.

Emergent Dexterity

Robustness and Emergent Retrying Behavior

OmniReset policies are robust to perturbations and succeed across the entire workspace. OmniReset learns robust policies over ranges of initial conditions >300x wider than baselines (see comparisons below).

Robustness to perturbations

Difficult initial conditions at the edges of the workspace

Non-Prehensile Behaviors

OmniReset discovers non-prehensile skills that exploit environment dynamics. Watch the robot reorient the peg using the hole!

Diverse Long-Horizon Stitching

OmniReset stitches together diverse, emergent skills into long-horizon behaviors, with no task-specific priors.

avoid obstacle → flip → push → wiggle in

reach → flip → insert

reach → pick → insert → release → twist

Digging into the Reset Distributions

OmniReset aims to cover all contact-rich states the robot might encounter and all potential paths to the goal. Large-scale RL (64K+ environments) then sorts through these options to find successful behaviors for each task.

The user only specifies the pose of the object at success. From that, we automatically generate the following distributions to cover all reasonable robot-object configurations.

Near Goal

The object near the goal with the gripper close to the object.

Grasped

The robot gripper grasping the object.

Near Object

The robot gripper close to the object.

Reaching

The robot gripper randomly positioned.

Our reset distributions zoomed in on leg to show diversity of object-robot configurations.

Near Goal

Grasped

Visualizing Learning Over Time

OmniReset requires no curricula or reward shaping, just diverse resets and large-scale compute. PPO naturally learns backwards from near-goal states to the full workspace.

Scalable Simulation Training

OmniReset scales gracefully to a large number of tasks.

Leg Twisting

Screw the leg into the table.

Drawer Assembly

Insert the drawer into the drawer box.

Peg Insertion

Insert the peg in the hole.

Rectangle on Wall

Reorient the rectangular block to the target position on the wall.

Cube Stacking

Stack the cubes on top of each other.

Birthday Party

Pick up the cupcake and place it on the plate.

Leg Twisting (×4)

Demo only. Runs the leg twisting policy four times from a fixed initial state (no large distribution coverage).

Sim2Real Transfer

OmniReset policies distilled to RGB with zero-shot sim2real transfer.

Simulation

Real World

Simulation

Real World

Simulation

Real World

Baseline Comparisons

We evaluate on easy (restricted initial conditions) and hard (full workspace) variants of each task. Baselines rarely succeed on hard settings. The scatter plot shows successful initial conditions for OmniReset (left) vs. the strongest baseline, Demo Curriculum (right).

Here we plot learning curves for different methods across different variations of the task. OmniReset consistently achieves high success rates where baselines struggle to make progress in learning.

Real-World Evaluations

The three stacked views on the right are policy inputs; the main view is for visualization only.

The most impressive moments from our evaluations, supercut.

Peg insertion

Table assembly

Drawer assembly

Policies recover from perturbations and retry after mistakes.