Hover over the state space to preview neighbors, then click to lock in a query state and see how DARP processes it through neighbor retrieval, difference vector calculation, and aggregation. Watch how the predicted action (blue) emerges from averaging neighbor actions. Click again to unlock your query point and enter hover mode again.
Click a query state to begin
Awaiting neighbor predictions
How DARP Works
For query state sq, find the k closest states from expert demonstrations. For each neighbor, calculate the difference vector Δi = s*i - sq and form input tuple (s*i, a*i, Δi).
Pass each tuple through network fθ to get action candidate: a'i = fθ(s*i, a*i, Δi). This processes each neighbor independently.
Combine all action candidates via permutation-invariant function: âq = gψ({a'1, ..., a'k}). The final prediction is typically an average of individual predictions.
Note: This is a simplified simulation for illustration purposes and does not use an actual trained DARP model.
Performance (Return)
| Task | BC | DARP (Ours) |
|---|---|---|
| Hopper | 2313.65 ± 203.75 | 3545.57 ± 3.54 |
| Ant | 2376.20 ± 339.43 | 4383.28 ± 266.37 |
| Walker | 2658.40 ± 274.08 | 4894.01 ± 75.12 |
| HalfCheetah | 1063.23 ± 371.08 | 5515.41 ± 841.33 |
Scores averaged across 100 trials with 95% confidence intervals.
Success Rate (%) w/ Image Embeddings
| Task | BC | DARP (Ours) |
|---|---|---|
| Stack | 44% | 75% |
| Threading | 38% | 76% |
| Peg Insertion | 17% | 52% |
Success Rate (%) w/ Low-Dimensional State
| Task | BC | DARP (Ours) |
|---|---|---|
| Stack | 47% | 72% |
| Threading | 37% | 63% |
| Peg Insertion | 46% | 62% |
Success rate over 100 trials.
Success Rate (%)
| Task | BC | DARP (Ours) |
|---|---|---|
| Close Drawer | 54% | 85% |
| Close Door | 29% | 45% |
| Turn on Stove | 28% | 43% |
Success rate over 100 trials.
Watch how BC and DARP behave when they drift out-of-distribution. While both agents' states become unlikely under the expert distribution, DARP's difference vectors to nearest neighbors remain in-distribution, enabling stable recovery.
Key Insight: As the animation progresses, BC (red) drifts away from the expert demonstrations and its state likelihood drops significantly. DARP (green) experiences similar perturbations but recovers and stays near the expert manifold because its difference vectors Δ = s* - sq to nearest neighbors remain in-distribution (~90%). These difference vectors are similar to the distances between expert demonstration states, enabling DARP to make reliable predictions and maintain stability.
Note: This is a simplified simulation for illustration purposes and does not use an actual trained DARP model. See section 3.4 in the paper for plots generated with trained DARP models.
While simple averaging works well for unimodal action distributions, many robotic tasks require representing multimodal behaviors. DARP can be extended to use more complex permutation-invariant aggregation functions like Set Transformers or Deep Sets to handle complex, multimodal action distributions.
Averaging neighbor predictions collapses multimodal distributions into a single mode, which is oftentimes not useful.
More complex set-based aggregation preserves multimodality, enabling the model to represent multi-modal expert behaviors.