Search papers, labs, and topics across Lattice.
The paper introduces Latent-Aligned Routing for Mixture of Experts (LAR-MoE), a two-stage imitation learning framework that decouples skill discovery from policy learning by using a joint latent representation between observations and future actions learned via student-teacher co-training. LAR-MoE regularizes expert routing based on the learned latent space structure, preventing expert collapse and improving parameter efficiency. Experiments on the LIBERO benchmark and a surgical task demonstrate that LAR-MoE achieves high success rates and matches supervised MoE performance without requiring phase annotations, suggesting it's a viable alternative to supervised skill decomposition.
LAR-MoE unlocks structured expert specialization in robotic imitation learning from unlabeled demonstrations, rivaling supervised methods without needing manual phase annotations.
Imitation learning enables robots to acquire manipulation skills from demonstrations, yet deploying a policy across tasks with heterogeneous dynamics remains challenging, as models tend to average over distinct behavioral modes present in the demonstrations. Mixture-of-Experts (MoE) architectures address this by activating specialized subnetworks, but requires meaningful skill decompositions for expert routing. We introduce Latent-Aligned Routing for Mixture of Experts (LAR-MoE), a two-stage framework that decouples unsupervised skill discovery from policy learning. In pre-training, we learn a joint latent representation between observations and future actions through student-teacher co-training. In a post-training stage, the expert routing is regularized to follow the structure of the learned latent space, preventing expert collapse while maintaining parameter efficiency. We evaluate LAR-MoE in simulation and on hardware. On the LIBERO benchmark, our method achieves a 95.2% average success rate with 150M parameters. On a surgical bowel grasping and retraction task, LAR-MoE matches a supervised MoE baseline without requiring any phase annotations, and transfers zero-shot to ex vivo porcine tissue. Our findings suggest that latent-aligned routing provides a principled alternative to supervised skill decomposition, enabling structured expert specialization from unlabeled demonstrations.