Search papers, labs, and topics across Lattice.
Achieve 49% and 19% better Chamfer distance than state-of-the-art dynamic surface reconstruction methods on Hi4D and CMU Panoptic datasets, respectively, by enforcing temporal consistency in Gaussian Splatting.
By explicitly reasoning in 3D, VolumeDP leaps ahead of 2D-based imitation learning methods, achieving a remarkable 14.8% improvement on the LIBERO benchmark and robust real-world generalization.
Stop wrestling with incompatible human body models: SOMA lets you mix and match SMPL, SMPL-X, and more, unlocking the power of diverse datasets in a single, differentiable pipeline.
A hybrid cuVSLAM-based visual SLAM system achieves superior mapping accuracy in real-world logistics environments, outperforming other VO/VSLAM approaches.
World Action Models can ditch the slow, iterative "imagine-then-execute" loop at test time without sacrificing performance, achieving a 4x speedup.
Current image generation unlearning methods are surprisingly brittle: adversarial image prompts, optimized with attention-guided masking, can effectively resurrect supposedly "forgotten" concepts.
Kimodo leaps ahead in controllable human motion generation by training a diffusion model on a massive 700-hour mocap dataset, enabling unprecedented control fidelity via text and diverse kinematic constraints.
MLLMs can now handle 4K videos up to 100x faster thanks to AutoGaze, which selectively attends to only the most informative patches.
Retrieval augmentation lets head avatars handle novel expressions better by mixing in similar expressions from a large unlabeled dataset during training, boosting generalization without extra labels or architecture changes.
Achieve a 40% jump in success rates on real-world contact-rich manipulation by intelligently scheduling force feedback into visual-motor policies.
Forget predefined areas of interest: this multi-agent exploration framework uses Gaussian belief mapping to adaptively balance scientific discovery and safety in hazardous off-world environments.
Text-to-video generation gets a 1.58x speed boost with CalibAtt, a training-free method that exploits consistent sparsity patterns in attention layers.
Achieve significantly sharper and more detailed 4D vascular reconstructions from sparse DSA data by injecting super-resolution priors into Gaussian Splatting.
Achieve state-of-the-art results in high-resolution video geometry estimation by disentangling global coherence and fine detail using a dual-stream transformer architecture.
By explicitly guiding attention with predicted action sequences, AGA overcomes the limitations of standard dot-product attention in video action anticipation, leading to better generalization and interpretability.
Achieve up to 28% better success rates in whole-body mobile manipulation by decoupling base and arm control while intelligently allocating perceptual attention.
Ditch quadratic scaling in 3D reconstruction: VGG-T$^3$ achieves linear scaling and a 11.6x speed-up by distilling scene geometry into a fixed-size MLP.
By explicitly disentangling shared and view-specific features across multi-view fundus images, MVGFDR achieves superior diabetic retinopathy grading compared to methods that directly fuse visual features.
ShallowConvNet emerges as a surprisingly effective architecture for decoding user intent from EEG signals in real-world robotic control, outperforming more complex models like Transformers.
Time series generation can be dramatically improved by explicitly conditioning on semantic understanding, as demonstrated by a novel vision-centric framework.
Forget complex architectures: RaCo achieves SOTA keypoint matching and repeatability by cleverly combining ranking and covariance estimation in a lightweight network, trained without covisible image pairs.
Forget monolithic LoRAs: LoRWeB dynamically mixes a basis set of LoRAs to unlock SOTA generalization in visual analogy tasks.
Achieve state-of-the-art depth completion by adapting 3D foundation models at test time with minimal parameter updates, outperforming task-specific encoders that often overfit.
Pathology image analysis just got a whole lot greener: LitePath slashes computational costs by 400x while matching the accuracy of state-of-the-art models, making AI-powered diagnostics accessible on low-power edge devices.
Forget tedious manual segmentation: ArtisanGS lets you lasso objects in 3D Gaussian Splats with AI-powered 2D selections that propagate into 3D, giving you unprecedented control over editing.
Synthesizing realistic radar data from camera images is now possible, bridging the gap between visual and radar perception for autonomous driving.
Current NVS evaluation metrics are misleading, so this paper introduces a task-aware framework using Zero123 features that actually aligns with human perception of quality and faithfulness.
Synthetic MRI data, generated by a segmentation-conditioned diffusion model, can measurably improve the performance of 3D U-Nets for hepatic segmentation.
Imagine training robots to manipulate objects in the real world, but entirely within a high-fidelity, diffusion-based dream.
Multimodal ophthalmic AI is poised for a leap, but current models still struggle with data variability, limited annotations, and generalization across diverse patient populations.