Search papers, labs, and topics across Lattice.
This paper addresses the challenge of object pose estimation using neural implicit fields, where predicting canonical coordinates for unobserved regions in camera space leads to uncertainty and inaccurate estimations. To mitigate this, the authors propose a method combining an SO(3)-equivariant convolutional implicit network for point-level attribute estimation and a positive-incentive point sampling (PIPS) strategy to dynamically determine sampling locations. The proposed approach demonstrates state-of-the-art performance on three pose estimation datasets, especially in challenging scenarios involving unseen poses, high occlusion, novel geometry, and severe noise.
By intelligently sampling points in neural implicit fields, this method dramatically improves object pose estimation, especially when dealing with occluded objects and unseen poses.
Learning neural implicit fields of 3D shapes is a rapidly emerging field that enables shape representation at arbitrary resolutions. Due to the flexibility, neural implicit fields have succeeded in many research areas, including shape reconstruction, novel view image synthesis, and more recently, object pose estimation. Neural implicit fields enable learning dense correspondences between the camera space and the object's canonical space-including unobserved regions in camera space-significantly boosting object pose estimation performance in challenging scenarios like highly occluded objects and novel shapes. Despite progress, predicting canonical coordinates for unobserved camera-space regions remains challenging due to the lack of direct observational signals. This necessitates heavy reliance on the model's generalization ability, resulting in high uncertainty. Consequently, densely sampling points across the entire camera space may yield inaccurate estimations that hinder the learning process and compromise performance. To alleviate this problem, we propose a method combining an SO(3)-equivariant convolutional implicit network and a positive-incentive point sampling (PIPS) strategy. The SO(3)-equivariant convolutional implicit network estimates point-level attributes with SO(3)-equivariance at arbitrary query locations, demonstrating superior performance compared to most existing baselines. The PIPS strategy dynamically determines sampling locations based on the input, thereby boosting the network's accuracy and training efficiency. Our method outperforms the state-of-the-art on three pose estimation datasets. Notably, it demonstrates significant improvements in challenging scenarios, such as objects captured with unseen pose, high occlusion, novel geometry, and severe noise.