DARP

Difference-Aware Retrieval Policies for Imitation Learning

Quinn Pfeifer¹, Ethan Pronovost¹, Paarth Shah², Khimya Khetarpal^{3, 4}, Siddhartha Srinivasa¹, Abhishek Gupta^{1, 2}
¹Paul G. Allen School of Computer Science & Engineering, University of Washington, ²Toyota Research Institute, ³Google DeepMind, ⁴Mila

Read the paper Code Google Colab

DARP Architecture

Hover over the state space to preview neighbors, then click to lock in a query state and see how DARP processes it through neighbor retrieval, difference vector calculation, and aggregation. Watch how the predicted action (blue) emerges from averaging neighbor actions. Click again to unlock your query point and enter hover mode again.

1. Retrieve Neighbors & Visualize Inputs

Query State (s_q)

Neighbor State (s*)

Action (a*)

Difference (Δ)

Predicted Action (â_q)

2. Process Each Neighbor: f_θ(s*, a*, Δ) → a'

Click a query state to begin

3. Aggregate: â_q = g_ψ({a'₁, ..., a'_k})

Awaiting neighbor predictions

How DARP Works

Retrieve Neighbors & Compute Differences

For query state s_q, find the k closest states from expert demonstrations. For each neighbor, calculate the difference vector Δ_i = s*_i - s_q and form input tuple (s*_i, a*_i, Δ_i).

Predict Action Per Neighbor

Pass each tuple through network f_θ to get action candidate: a'_i = f_θ(s*_i, a*_i, Δ_i). This processes each neighbor independently.

Aggregate Predictions

Combine all action candidates via permutation-invariant function: â_q = g_ψ({a'₁, ..., a'_k}). The final prediction is typically an average of individual predictions.

Note: This is a simplified simulation for illustration purposes and does not use an actual trained DARP model.

Results

Task: Hopper

DARP

Task: Ant

DARP

Task: Walker

DARP

Task: HalfCheetah

DARP

Performance (Return)

Task	BC	DARP (Ours)
Hopper	2313.65 ± 203.75	3545.57 ± 3.54
Ant	2376.20 ± 339.43	4383.28 ± 266.37
Walker	2658.40 ± 274.08	4894.01 ± 75.12
HalfCheetah	1063.23 ± 371.08	5515.41 ± 841.33

Scores averaged across 100 trials with 95% confidence intervals.

Task: Stack

DARP

Task: Threading

DARP

Task: Peg Insertion

DARP

Success Rate (%) w/ Image Embeddings

Task	BC	DARP (Ours)
Stack	44%	75%
Threading	38%	76%
Peg Insertion	17%	52%

Success Rate (%) w/ Low-Dimensional State

Task	BC	DARP (Ours)
Stack	47%	72%
Threading	37%	63%
Peg Insertion	46%	62%

Success rate over 100 trials.

Task: Close Drawer

DARP

Task: Close Door

DARP

Task: Turn on Stove

DARP

Success Rate (%)

Task	BC	DARP (Ours)
Close Drawer	54%	85%
Close Door	29%	45%
Turn on Stove	28%	43%

Success rate over 100 trials.

Divergence Analysis

Watch how BC and DARP behave when they drift out-of-distribution. While both agents' states become unlikely under the expert distribution, DARP's difference vectors to nearest neighbors remain in-distribution, enabling stable recovery.

Expert Demonstrations

BC Agent (OOD State)

DARP Agent (OOD State, ID Δ)

BC State Likelihood

100%

DARP State Likelihood

100%

DARP Δ Likelihood
100%

Key Insight: As the animation progresses, BC (red) drifts away from the expert demonstrations and its state likelihood drops significantly. DARP (green) experiences similar perturbations but recovers and stays near the expert manifold because its difference vectors Δ = s* - s_q to nearest neighbors remain in-distribution (~90%). These difference vectors are similar to the distances between expert demonstration states, enabling DARP to make reliable predictions and maintain stability.

Note: This is a simplified simulation for illustration purposes and does not use an actual trained DARP model. See section 3.4 in the paper for plots generated with trained DARP models.

Handling Multimodal Action Distributions

While simple averaging works well for unimodal action distributions, many robotic tasks require representing multimodal behaviors. DARP can be extended to use more complex permutation-invariant aggregation functions like Set Transformers or Deep Sets to handle complex, multimodal action distributions.

Simple Averaging

Averaging neighbor predictions collapses multimodal distributions into a single mode, which is oftentimes not useful.

Set Transformer / DeepSets

More complex set-based aggregation preserves multimodality, enabling the model to represent multi-modal expert behaviors.