Search papers, labs, and topics across Lattice.
4
0
5
0
Semantic masks are all you need: predicting mask dynamics in world models yields surprisingly robust and generalizable robot policies compared to predicting raw pixels.
Expression Drift can be mitigated, leading to state-of-the-art performance in text-to-image person retrieval without any training overhead.
Achieve robust SLAM in dynamic environments without semantic labels or depth sensors by disentangling scene dynamics with a generalizable motion model.
VLN agents can navigate more effectively by predicting their future states and proactively planning based on forecasted semantic map cues, rather than relying solely on historical context.