Search papers, labs, and topics across Lattice.
This paper introduces a method for training risk-aware visuomotor policies for mobile manipulation using Distributional Reinforcement Learning (DRL) and Imitation Learning (IL). A risk-neutral distributional critic is trained, and distortion risk-metrics are applied to its predicted return distribution to calculate risk-adjusted advantage estimates for policy updates. The resulting risk-aware teacher policies are then distilled via IL into student policies conditioned on egocentric depth observations, demonstrating improved worst-case performance in unmapped environments.
Mobile robots can now learn to be risk-averse (or risk-seeking) in unstructured environments, adapting their behavior based on live depth camera feeds.
For robots to successfully transition from lab settings to everyday environments, they must begin to reason about the risks associated with their actions and make informed, risk-aware decisions. This is particularly true for robots performing mobile manipulation tasks, which involve both interacting with and navigating within dynamic, unstructured spaces. However, existing whole-body controllers for mobile manipulators typically lack explicit mechanisms for risk-sensitive decision-making under uncertainty. To our knowledge, we are the first to (i) learn risk-aware visuomotor policies for mobile manipulation conditioned on egocentric depth observations with runtime-adjustable risk sensitivity, and (ii) show risk-aware behaviours can be transferred through Imitation Learning (IL) to a visuomotor policy conditioned on egocentric depth observations. Our method achieves this by first training a privileged teacher policy using Distributional Reinforcement Learning (DRL), with a risk-neutral distributional critic. Distortion risk-metrics are then applied to the critic's predicted return distribution to calculate risk-adjusted advantage estimates used in policy updates to achieve a range of risk-aware behaviours. We then distil teacher policies with IL to obtain risk-aware student policies conditioned on egocentric depth observations. We perform extensive evaluations demonstrating that our trained visuomotor policies exhibit risk-aware behaviour (specifically achieving better worst-case performance) while performing reactive whole-body motions in unmapped environments, leveraging live depth observations for perception.