UWTRIJun 8, 2026arXiv:2606.09758

Difference-Aware Retrieval Policies for Imitation Learning

Paarth Shah, Khimya Khetarpal, Abhishek Gupta

AI Summary

This paper introduces Difference-Aware Retrieval Policies for Imitation Learning (DARP), a novel semi-parametric approach that enhances generalization in imitation learning by leveraging local neighborhood structures instead of relying solely on direct state-to-action mappings. By utilizing $k$-nearest neighbors from expert demonstrations, DARP effectively mitigates the compounding errors typically encountered during deployment, leading to significant performance improvements. The method achieves a consistent enhancement of 15-46% over standard behavior cloning across various domains, including continuous control and robotic manipulation, without requiring additional data collection or expert feedback.

Key Contribution

Reusing training data during inference can boost imitation learning performance by up to 46%, transforming how we approach generalization in AI systems.

Abstract

Parametric imitation learning via behavior cloning can suffer from poor generalization to out-of-distribution states due to compounding errors during deployment. We show that reusing the training data during inference via a semi-parametric retrieval-based imitation learning approach can alleviate this challenge. We present Difference-Aware Retrieval Policies for Imitation Learning (DARP), a semi-parametric retrieval-based imitation learning approach that addresses this limitation by reparameterizing the imitation learning problem in terms of local neighborhood structure rather than direct state-to-action mappings. Instead of learning a global policy, DARP trains a model to predict actions based on $k$-nearest neighbors from expert demonstrations, their corresponding actions, and the relative distance vectors between neighbor states and query states. DARP requires no additional assumptions beyond those made for standard behavior cloning -- it does not require additional data collection, online expert feedback, or task-specific knowledge. We demonstrate consistent performance improvements of 15-46% over standard behavior cloning across diverse domains, including continuous control and robotic manipulation, and across different representations, including high-dimensional visual features. Code and demos are available at https://weirdlabuw.github.io/darp-site/.

Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Difference-Aware Retrieval Policies for Imitation Learning

Related Papers