DARP
Difference-Aware Retrieval Policies for Imitation Learning
Quinn Pfeifer1, Ethan Pronovost1, Paarth Shah2, Khimya Khetarpal3, 4, Siddhartha Srinivasa1, Abhishek Gupta1, 2
1Paul G. Allen School of Computer Science & Engineering, University of Washington, 2Toyota Research Institute, 3Google DeepMind, 4Mila
DARP Architecture

Hover over the state space to preview neighbors, then click to lock in a query state and see how DARP processes it through neighbor retrieval, difference vector calculation, and aggregation. Watch how the predicted action (blue) emerges from averaging neighbor actions. Click again to unlock your query point and enter hover mode again.

1. Retrieve Neighbors & Visualize Inputs
Query State (sq)
Neighbor State (s*)
Action (a*)
Difference (Δ)
Predicted Action (âq)
2. Process Each Neighbor: fθ(s*, a*, Δ) → a'

Click a query state to begin

3. Aggregate: âq = gψ({a'1, ..., a'k})

Awaiting neighbor predictions

How DARP Works

1
Retrieve Neighbors & Compute Differences

For query state sq, find the k closest states from expert demonstrations. For each neighbor, calculate the difference vector Δi = s*i - sq and form input tuple (s*i, a*i, Δi).

2
Predict Action Per Neighbor

Pass each tuple through network fθ to get action candidate: a'i = fθ(s*i, a*i, Δi). This processes each neighbor independently.

3
Aggregate Predictions

Combine all action candidates via permutation-invariant function: âq = gψ({a'1, ..., a'k}). The final prediction is typically an average of individual predictions.

Note: This is a simplified simulation for illustration purposes and does not use an actual trained DARP model.

Results

Performance (Return)

Task BC DARP (Ours)
Hopper 2313.65 ± 203.75 3545.57 ± 3.54
Ant 2376.20 ± 339.43 4383.28 ± 266.37
Walker 2658.40 ± 274.08 4894.01 ± 75.12
HalfCheetah 1063.23 ± 371.08 5515.41 ± 841.33

Scores averaged across 100 trials with 95% confidence intervals.

Success Rate (%) w/ Image Embeddings

Task BC DARP (Ours)
Stack 44% 75%
Threading 38% 76%
Peg Insertion 17% 52%

Success Rate (%) w/ Low-Dimensional State

Task BC DARP (Ours)
Stack 47% 72%
Threading 37% 63%
Peg Insertion 46% 62%

Success rate over 100 trials.

Success Rate (%)

Task BC DARP (Ours)
Close Drawer 54% 85%
Close Door 29% 45%
Turn on Stove 28% 43%

Success rate over 100 trials.

Divergence Analysis

Watch how BC and DARP behave when they drift out-of-distribution. While both agents' states become unlikely under the expert distribution, DARP's difference vectors to nearest neighbors remain in-distribution, enabling stable recovery.

Expert Demonstrations
BC Agent (OOD State)
DARP Agent (OOD State, ID Δ)
BC State Likelihood
100%
DARP State Likelihood
100%
DARP Δ Likelihood
100%

Key Insight: As the animation progresses, BC (red) drifts away from the expert demonstrations and its state likelihood drops significantly. DARP (green) experiences similar perturbations but recovers and stays near the expert manifold because its difference vectors Δ = s* - sq to nearest neighbors remain in-distribution (~90%). These difference vectors are similar to the distances between expert demonstration states, enabling DARP to make reliable predictions and maintain stability.

Note: This is a simplified simulation for illustration purposes and does not use an actual trained DARP model. See section 3.4 in the paper for plots generated with trained DARP models.

Handling Multimodal Action Distributions

While simple averaging works well for unimodal action distributions, many robotic tasks require representing multimodal behaviors. DARP can be extended to use more complex permutation-invariant aggregation functions like Set Transformers or Deep Sets to handle complex, multimodal action distributions.

Simple Averaging

Averaging neighbor predictions collapses multimodal distributions into a single mode, which is oftentimes not useful.

Set Transformer / DeepSets

More complex set-based aggregation preserves multimodality, enabling the model to represent multi-modal expert behaviors.