Search papers, labs, and topics across Lattice.
This paper introduces a transformer-based inpainting method for completing missing textures in real-time 3D streaming from sparse multi-camera setups. The method uses a multi-view aware network architecture with spatio-temporal embeddings to ensure consistency across frames. The proposed approach achieves a better trade-off between quality and speed compared to state-of-the-art inpainting techniques, enabling real-time performance while preserving fine details.
Achieve visually consistent and detailed 3D streaming from sparse multi-camera setups in real-time with a novel transformer-based inpainting method that outperforms existing techniques.
High-quality 3D streaming from multiple cameras is crucial for immersive experiences in many AR/VR applications. The limited number of views - often due to real-time constraints - leads to missing information and incomplete surfaces in the rendered images. Existing approaches typically rely on simple heuristics for the hole filling, which can result in inconsistencies or visual artifacts. We propose to complete the missing textures using a novel, application-targeted inpainting method independent of the underlying representation as an image-based post-processing step after the novel view rendering. The method is designed as a standalone module compatible with any calibrated multi-camera system. For this we introduce a multi-view aware, transformer-based network architecture using spatio-temporal embeddings to ensure consistency across frames while preserving fine details. Additionally, our resolution-independent design allows adaptation to different camera setups, while an adaptive patch selection strategy balances inference speed and quality, allowing real-time performance. We evaluate our approach against state-of-the-art inpainting techniques under the same real-time constraints and demonstrate that our model achieves the best trade-off between quality and speed, outperforming competitors in both image and video-based metrics.