Search papers, labs, and topics across Lattice.
5
0
4
12
Pre-trained vision encoders are shockingly bad at distinguishing identity when background context is controlled, but a simple contrastive learning scheme can fix it.
Reconstructing complete, animatable 3D avatars from heavily occluded YouTube videos is now possible, thanks to a hallucination-as-supervision pipeline using diffusion models.
Achieve real-time egocentric motion capture with 19% better accuracy and half the jitter of prior art, thanks to a transformer architecture and self-supervised pretraining on millions of unlabeled frames.
Key contribution not extracted.
Finally, you can generate a fully animatable, 360-degree 3D head avatar from a single image, without per-instance optimization.