Search papers, labs, and topics across Lattice.
This paper investigates using training-time instrumentation (specifically, a microphone fingertip) to improve contact-rich manipulation policies for button pressing, without requiring instrumentation at deployment. They fine-tune an audio encoder using privileged button-state information to create a contact event detector, which is then integrated into imitation learning policies. The key result is that instrumentation-guided audio representations, used only during training, consistently reduce contact force during button presses while maintaining success rates.
Training robots with temporary "superpowers" like hearing contact sounds can lead to gentler, more controlled touch, even when they're "deaf" in the real world.
Learning contact-rich manipulation is difficult from cameras and proprioception alone because contact events are only partially observed. We test whether training-time instrumentation, i.e., object sensorisation, can improve policy performance without creating deployment-time dependencies. Specifically, we study button pressing as a testbed and use a microphone fingertip to capture contact-relevant audio. We use an instrumented button-state signal as privileged supervision to fine-tune an audio encoder into a contact event detector. We combine the resulting representation with imitation learning using three strategies, such that the policy only uses vision and audio during inference. Button press success rates are similar across methods, but instrumentation-guided audio representations consistently reduce contact force. These results support instrumentation as a practical training-time auxiliary objective for learning contact-rich manipulation policies.