Search papers, labs, and topics across Lattice.
6
0
8
14
HERMES++ achieves state-of-the-art performance in both future point cloud prediction and 3D scene understanding by unifying these tasks within a single driving world model.
DINO, not CLIP, might be the better foundation for open-set 3D object retrieval, especially when paired with dynamic view integration and virtual feature synthesis to avoid overfitting.
Doc-V* demonstrates that an agentic approach to multi-page document VQA, using active navigation and structured memory, can significantly outperform retrieval-augmented generation, especially in out-of-domain scenarios.
Text-to-video diffusion models can now count (more accurately) without retraining, thanks to a clever attention-based guidance method.
Achieve state-of-the-art 3D scene understanding by dynamically adapting network parameters at test time, proving that input-aware adjustments can significantly boost performance with minimal overhead.
World models can now remember and realistically regenerate dynamic objects that temporarily disappear from view, thanks to a novel hybrid memory architecture.